Minutes of the December 1, 1998  Meeting of MONARC Site and Network
                        Architecture Working Group

Attending:  Joel Butler, Mauro Campanella, Paulo Capiluppi, Gregory Denis,
            Michael Diesburg, Irwin Gaines, Phillipe Galvez, Iosef Legrand,
            Ian McArthur, Vivian Odell, Ruth Pordes  Les Robertson,
            Tim Smith, and John Womersley

Preliminary Notes: 
		1) Several people have possible problems with the proposed
                   date of the next meeting -- December 15. We will discuss
		   by EMAIL the possibility of rescheduling it. We still
		   want another meeting before the Winter break.
		2) Thanks to the efforts of Tim Smith, we now have a
		   mailing list for this working group. Its address is:

                       monarc-architecture@listbox.cern.ch

		   We subscribed everyone who was listed as attending
		   the November 17 meeting and a few others who had 
		   expressed interest. If you want to be on the mailing
		   list please send mail to me. If you got a message that
		   you have been subscribed and want to be removed, you
		   can let me know or unsubscribe yourself. Tim also informs
		   me that EMAIL to the list will be automatically archived.
		3) If anyone has documents that they feel should be linked
		   to our web page, please send the URL to me. 

Main Agenda Items:

I. Survey of existing architectures

	The first topic we discussed was the report on the analysis 
experience of existing experiments. We established the list of experiments
and got volunteers to develop the required information for each. The list is:


	Experiment		Summarizer
        -------------------------------------
	LEP experiments
	  ALEPH			Tim Smith
	  DELPHI		  "
	  L3			  "
	  OPAL			  "

	CERN Fixed Target
	  NA48			Tim Smith
	  COMPASS		  "
	  NA45			  "

	FNAL experiments
	  CDF			Mike Diesburg
	  D0			Mike Diesburg
	  KTeV			Vivian Odell

	DESY experiments
	  ZEUS			Ian McArthur

		
	The basic description of the report is specified in the PEP
	as follows: 

	4.3.1 Subtask: Survey of existing computing architectures

	Descriptions of computing architectures used by current experiments 
	should be prepared, concentrating on LEP, HERA, the FNAL Collider, 
	the large fixed target experiments at CERN and FNAL.  Architectural 
	descriptions should include: 

	Central CPU for computing intensive applications 
	Central CPU for I/O intensive applications 
	Central CPU for data analysis 
	Central disk and mass storage facilities 
	Distributed CPU and disk (still at central site) 
	Distributed CPU, disk and mass storage at remote sites 
	Desktop systems 
	Local network bandwidth 
	Wide area network bandwidth 
	Discussion of how computing tasks are partitioned among various systems
	in a distributed architecture 

	Deliverables: 
	Reports and diagrams of existing computing architectures 

	Schedule: 
	to be completed by January 1, 1999 


	The four 'volunteers' will communicate  this week and develop
the format for the report including a description  of the tables that they 
plan to prepare. They will circulate their conclusions on this for review
by the group.  They will then start gathering information immediately.
We will expect a brief status report at our next Architecture group meeting
in the middle of December.

	There was some discussion of other information that the report might 
contain. However, in the end, it was decided to stick close to the charge
and to concentrate on hardware parameters. The overlap with the information
being gathered by the Analysis Working group was noted. We hope that members
of that group (there is a good overlap in membership between the two groups)
will hear the reports at the next meeting and we can discuss how to make
sure that information is shared and that between the two groups, we collect
everything we need.

	It was noted that the entries on local and wide area bandwidth are not
likely to be helpful. What is of interest is what actual throughput end-to-end 
is experienced  by a typical user. The summarizers should take this comment 
under consideration.

	We also discussed the the method and schedule for the production of
the report. We urge the experiment summarizers to complete their work by
Christmas. I will help edit their information into a draft that will be
available for review and comment just after the first of the year. We will
then have to get whatever approvals we need from the project management.
I hope that we can submit it in the third week of January. Although this
is a bit later than planned, I think it is the best that we can do given our
current status and the approach of the holidays. I could certainly use someone
to collaborate on editing the document so a volunteer would be most welcome.

II. Discussion of Regional Centers

	The purpose of this discussion was to begin to create a framework for
conducting the discussion of regional centers.  It was noted that many ideas
on this topic have already appeared in official documents and many
ideas have been circulated and discussed informally. Once again, I'd like
to get any relevant documents linked to our web page so everyone can have easy
access to them. 

	It is impossible for me to summarize all aspects of the discussion
These are the most important points:

	A regional center is characterized by the services and facilities
it provides for the data analysis. Centers should focus on providing especially
those services which are very expensive or hard to implement and/or maintain.
A loose scheme for describing a particular regional center might be:

	Capabilities:   
			Services Provided (including support services)
			Facilities available including capacity

	Constituency:

			Who are the main anticipated users of the services and
			facilities? What functions will they carry out using
			those services and facilities?

	Data Profile:

			What types of datasets are available at the center?

			What fraction or portion of the full dataset of each
			type is available?

			Which non-event databases are available at each center?

	Communication Profile:

			How does the data stored at the center get there?

			How is the data at the center made accessible to
			it users?

	Collaboration/Dependency:

			How does the regional center depend on the
			principal center at CERN?

			How does the regional center interact or
			interrelate to other centers?

			How do the services provided by the center
			relate to the full offline analysis task of
			its collaboration?


	It is understood that there will probably be a large number of 
centers of varying capabilities. (Not all of these should be referred to as
'regional centres'.) It is hoped that a profile such as the one 
above can provide a useful framework for discussing and categorizing them.
Irwin mentioned that the notion of several 'tiers' of computing
center seems to be accepted. It was agreed that the discussion should
proceed assuming that centers can be sorted into tiers -- that is, that some
quantization is good. The spread of capabilities and services within a tier 
will be much smaller than the gap between the tiers. While in reality,
the distribution may be smoother than this, at least for the purpose
of modeling  performance, it will help to think in terms of tiers. It was
felt that the definition of the tiers should be based on a minimum
set of requirements with respect to a list like the one above. Modeling
will probably focus on the top two or so 'tiers' which will be of sufficient
scale and scope to be called 'regional'.

	There already exists the concept of a 'large' center -- something
that is of order 10-20% of the "principal center" at CERN. This would be tier
1. The group wanted more information about what CERN was planning
to provide. Les Robertson volunteered to come  up with a  summary of
what a regional center that had of order 10% of the capability of the
CERN center would provide. Fermilab was asked to look at how it supported CDF
and D0 and to use that information as a way to scope a regional center.
Irwin Gaines, John Womersley, and Joel Butler agreed to work on that.
 
	During the discussion, several other important issues were raised

	1) In addition to describing the datasets at a center, it is
	   important to understand in detail the IO/CPU ratios for each type
	   of data. The centers have to be balanced in their capabilities
	   in order to be efficient providers of services.

	2) Discussion of databases must take into account possible 
	   significant CPU overheads involved in carrying out queries.

	3) Many issues raised by the preceding two points may be clarified
	   by the tests using Objectivity.

	4) Not only must a center provide a minimum service level, but it
	   should be architected so that it is 'scalable' -- that is
	   it can be easily augmented to support more physicists or
	   to provide more support to its existing group of physicists
	   as their work habits change.

	5) Flexibility is also an important requirement. It may be
	   necessary to migrate from one configuration to another as the
	   character of the analysis or changing hardware or software
	   dictates.


	There was a discussion of Distributed Regional Centers. The group
strongly feels that this is an implementation issue. As long as the minimum
requirements can be shown to be satisfied, the detailed implementation is 
really not an issue.

	I'd like to thank Luciano Barone for providing us with his notes
describing his thoughts on Regional Centers and Distributed Regional
Centers which have been posted to the web page. 


III. 10-155 collaboration

	I received the note from Mauro Campanella describing a possible
interest by  MONARC in opportunities offered by Quantum (TEN-155) or
TF-TANT to participate in tests of advanced network technologies. Details may 
be found on

                          http://www.dante.net (TEN-155) 

                        http://www.dante.net/quantum/qtp/  (list of tests)

         http://www.dante.net/ten-34/tf-ten.html (results of previous work)

It was not clear which  group in MONARC should discuss  this  proposal. 
A decision on whether MONARC is interested in this would need to 
be made quickly since a proposal would have to be ready in mid-January . 
MONARC management has agreed to take up consideration of this.