Minutes of the December 1, 1998 Meeting of MONARC Site and Network Architecture Working Group Attending: Joel Butler, Mauro Campanella, Paulo Capiluppi, Gregory Denis, Michael Diesburg, Irwin Gaines, Phillipe Galvez, Iosef Legrand, Ian McArthur, Vivian Odell, Ruth Pordes Les Robertson, Tim Smith, and John Womersley Preliminary Notes: 1) Several people have possible problems with the proposed date of the next meeting -- December 15. We will discuss by EMAIL the possibility of rescheduling it. We still want another meeting before the Winter break. 2) Thanks to the efforts of Tim Smith, we now have a mailing list for this working group. Its address is: monarc-architecture@listbox.cern.ch We subscribed everyone who was listed as attending the November 17 meeting and a few others who had expressed interest. If you want to be on the mailing list please send mail to me. If you got a message that you have been subscribed and want to be removed, you can let me know or unsubscribe yourself. Tim also informs me that EMAIL to the list will be automatically archived. 3) If anyone has documents that they feel should be linked to our web page, please send the URL to me. Main Agenda Items: I. Survey of existing architectures The first topic we discussed was the report on the analysis experience of existing experiments. We established the list of experiments and got volunteers to develop the required information for each. The list is: Experiment Summarizer ------------------------------------- LEP experiments ALEPH Tim Smith DELPHI " L3 " OPAL " CERN Fixed Target NA48 Tim Smith COMPASS " NA45 " FNAL experiments CDF Mike Diesburg D0 Mike Diesburg KTeV Vivian Odell DESY experiments ZEUS Ian McArthur The basic description of the report is specified in the PEP as follows: 4.3.1 Subtask: Survey of existing computing architectures Descriptions of computing architectures used by current experiments should be prepared, concentrating on LEP, HERA, the FNAL Collider, the large fixed target experiments at CERN and FNAL. Architectural descriptions should include: Central CPU for computing intensive applications Central CPU for I/O intensive applications Central CPU for data analysis Central disk and mass storage facilities Distributed CPU and disk (still at central site) Distributed CPU, disk and mass storage at remote sites Desktop systems Local network bandwidth Wide area network bandwidth Discussion of how computing tasks are partitioned among various systems in a distributed architecture Deliverables: Reports and diagrams of existing computing architectures Schedule: to be completed by January 1, 1999 The four 'volunteers' will communicate this week and develop the format for the report including a description of the tables that they plan to prepare. They will circulate their conclusions on this for review by the group. They will then start gathering information immediately. We will expect a brief status report at our next Architecture group meeting in the middle of December. There was some discussion of other information that the report might contain. However, in the end, it was decided to stick close to the charge and to concentrate on hardware parameters. The overlap with the information being gathered by the Analysis Working group was noted. We hope that members of that group (there is a good overlap in membership between the two groups) will hear the reports at the next meeting and we can discuss how to make sure that information is shared and that between the two groups, we collect everything we need. It was noted that the entries on local and wide area bandwidth are not likely to be helpful. What is of interest is what actual throughput end-to-end is experienced by a typical user. The summarizers should take this comment under consideration. We also discussed the the method and schedule for the production of the report. We urge the experiment summarizers to complete their work by Christmas. I will help edit their information into a draft that will be available for review and comment just after the first of the year. We will then have to get whatever approvals we need from the project management. I hope that we can submit it in the third week of January. Although this is a bit later than planned, I think it is the best that we can do given our current status and the approach of the holidays. I could certainly use someone to collaborate on editing the document so a volunteer would be most welcome. II. Discussion of Regional Centers The purpose of this discussion was to begin to create a framework for conducting the discussion of regional centers. It was noted that many ideas on this topic have already appeared in official documents and many ideas have been circulated and discussed informally. Once again, I'd like to get any relevant documents linked to our web page so everyone can have easy access to them. It is impossible for me to summarize all aspects of the discussion These are the most important points: A regional center is characterized by the services and facilities it provides for the data analysis. Centers should focus on providing especially those services which are very expensive or hard to implement and/or maintain. A loose scheme for describing a particular regional center might be: Capabilities: Services Provided (including support services) Facilities available including capacity Constituency: Who are the main anticipated users of the services and facilities? What functions will they carry out using those services and facilities? Data Profile: What types of datasets are available at the center? What fraction or portion of the full dataset of each type is available? Which non-event databases are available at each center? Communication Profile: How does the data stored at the center get there? How is the data at the center made accessible to it users? Collaboration/Dependency: How does the regional center depend on the principal center at CERN? How does the regional center interact or interrelate to other centers? How do the services provided by the center relate to the full offline analysis task of its collaboration? It is understood that there will probably be a large number of centers of varying capabilities. (Not all of these should be referred to as 'regional centres'.) It is hoped that a profile such as the one above can provide a useful framework for discussing and categorizing them. Irwin mentioned that the notion of several 'tiers' of computing center seems to be accepted. It was agreed that the discussion should proceed assuming that centers can be sorted into tiers -- that is, that some quantization is good. The spread of capabilities and services within a tier will be much smaller than the gap between the tiers. While in reality, the distribution may be smoother than this, at least for the purpose of modeling performance, it will help to think in terms of tiers. It was felt that the definition of the tiers should be based on a minimum set of requirements with respect to a list like the one above. Modeling will probably focus on the top two or so 'tiers' which will be of sufficient scale and scope to be called 'regional'. There already exists the concept of a 'large' center -- something that is of order 10-20% of the "principal center" at CERN. This would be tier 1. The group wanted more information about what CERN was planning to provide. Les Robertson volunteered to come up with a summary of what a regional center that had of order 10% of the capability of the CERN center would provide. Fermilab was asked to look at how it supported CDF and D0 and to use that information as a way to scope a regional center. Irwin Gaines, John Womersley, and Joel Butler agreed to work on that. During the discussion, several other important issues were raised 1) In addition to describing the datasets at a center, it is important to understand in detail the IO/CPU ratios for each type of data. The centers have to be balanced in their capabilities in order to be efficient providers of services. 2) Discussion of databases must take into account possible significant CPU overheads involved in carrying out queries. 3) Many issues raised by the preceding two points may be clarified by the tests using Objectivity. 4) Not only must a center provide a minimum service level, but it should be architected so that it is 'scalable' -- that is it can be easily augmented to support more physicists or to provide more support to its existing group of physicists as their work habits change. 5) Flexibility is also an important requirement. It may be necessary to migrate from one configuration to another as the character of the analysis or changing hardware or software dictates. There was a discussion of Distributed Regional Centers. The group strongly feels that this is an implementation issue. As long as the minimum requirements can be shown to be satisfied, the detailed implementation is really not an issue. I'd like to thank Luciano Barone for providing us with his notes describing his thoughts on Regional Centers and Distributed Regional Centers which have been posted to the web page. III. 10-155 collaboration I received the note from Mauro Campanella describing a possible interest by MONARC in opportunities offered by Quantum (TEN-155) or TF-TANT to participate in tests of advanced network technologies. Details may be found on http://www.dante.net (TEN-155) http://www.dante.net/quantum/qtp/ (list of tests) http://www.dante.net/ten-34/tf-ten.html (results of previous work) It was not clear which group in MONARC should discuss this proposal. A decision on whether MONARC is interested in this would need to be made quickly since a proposal would have to be ready in mid-January . MONARC management has agreed to take up consideration of this.