PASFRG Report

Miscellaneous Section


Batch processing:

Analysis tools must be capable of running both interactively and in batch mode. Scripts derived from an interactive session should be able to be passed to a batch job to reproduce the interactive analysis on a larger sample of events.Plots and other graphical output that are displayed on a terminal screen when running interactively should be saved for later examination when running in batch mode.

Sharing data structures among users:

At user option, data (and command) structures of various types should be capable of being made available to others, with some granularity on how widely the permission is granted (for example world-wide access, experiment-wide access, or physics group-wide access). This access should be granted to files of special types of data preserved in an analysis job, to selected samples of standard format data, to analysis macros and selection criteria, and to definitions of graphical output produced by an analysis job.

Shared dynamic access by several clients:

For online use, data structures (such as histograms) used for display purposes should be capable of being dynamically updated by other running processes. The data structures should be able to be shared among several jobs all having simultaneous read access to the data structure, thus allowing the plots to be viewed by several different users.

Parallel processing (using distinct data streams):

The analysis system must be capable of processing large numbers of events efficiently. If a single processor is not capable of providing the required throughput, the system should support simple parallel processing where different servers analyse separate event streams, with the results bring automaticaly combined before presentation.

Debugging and profiling

Good, robust and reliable debuggers are required in code development. Such debuggers are traditionally used with compiled languages. They are de-facto part of the programming environment: one must have one to do physics analysis. GUI front-end are probably prefered to command line driven. As a new scripting language(s) are proposed, one must be able to "debug" in that language. Since this scripting language, or interpreted language, is interactive, one could state that the debugger is de-facto there, as variable can printed, and scripts can be executed one line at a time. However, this does not mean that this interactive programming environment brings all the features of a good debugger for a compiled language: for instance, it is unclear how break points, conditional or not, are managed in complicated scripts. Thus, the script language must have a debugger. Likewise, profiling is particularly relevant when building large software system. Once must be able to locate the code segments that are used more often, so that they can be optmized or migrated to the compiled language. Integration of these tools is a sticky point in a mixed language environment. Although running nested debugger (e.g., starting with the C or C++ debugger, and firing the scripting debugger underneath) is possible, it could be confusing at times. Thus, seamless integration of the debugger/profiler should be on the wish list.

Modularity:

The analysis system (or framework) must be able to accomodate user-written modules, so that these modules can be interactively called. These modules are written in the prefered compiled language (C, C++, or FORTRAN), or in the scripting language, and can be executed within the "framework". This capability can be either based on dynamical linking, pipes, RPC calls & shared memmory access on UNIX systems, or similar access methods. It is mandatory that the data structures created in the user code, compiled language be accessible while running the interactive scripts, from the "framework". It is also desirable that all user-written methods or functions be accessible in an interactive session.

Access to source code:

It is strongly desired that we (the RunII-CD/D0/CDF staff assigned to support the RunII analysis tasks) have access to the source code of the ``shareware", or freeware componenents of the runII analysis system or framework. Read access to commercial source code is prefered, but probably not very realistic.

Robustness

While it is tempting to define robustness as synonymous with being absolutely bulletproof, a little reflection forces us to conclude that this is not> possible and that a more realistic definition is called for. Turning our thinking upside down, it is probably more fruitful to say what robust is not. The one word that seems to embody the opposite of robust is the word fragile in the sense that a fragile system is one that collapses utterly at the first irregularity. In general, there seems to be two gross sorts of irregularities, the first being things for which the user is responsible (pilot error), and the second being missing or faulty system resources which the user had the right to expect were present and functioning but which were, in fact, missing or broken.

To a reasonable extent, sorting exception conditions into these two classes helps to localize them. The first class is connected with the user's interaction with the system and suggests that the user interface needs to pay a lot of attention to validating the user's input before acting on it and potentially doing serious damage. This is painful and tedious but errors of this sort can and should almost always be identified, reported and perhaps even logged from within the interface. Thus the user interface should also be regarded as a sort of gatekeeper, denying access to the internals of the system unless the action request is properly formed and completely valid within the current context. On the other hand, the second class of exceptions tends to be related to the system's management of it's resources. Again, that tends to localize attention to the interaction with things other than the user, such as the file system or the network. Simply hanging or crashing when, for instance, the event data server is unavailable is not acceptable.

Web-based Documentation

This appears to be one of the easy items. Both within the scientific community and the commercial world, web-based documentation is rapidly becoming a standard way of distributing information, occasionally to the exclusion of the more traditional media. One typical example we happen to know is the documentation that accompanies the Rogue Wave Tools.h++ product. There is a set of printed manuals one can buy from the company but they appear to be little more than a hard-copy version of what can be had free from their web site. Moreover, given the production delays inherent in producing and distributing books, it is nearly certain that the documentation available on the web is more current.

The point here is that is it hardly necessary to make an issue of this since the rest of the world seems to have already decided that it is "A Good Thing." On the contrary, the only substantive question appears to be one of obtaining or providing the tools necessary to put our own, internally produced documentation on the web as well.

Commonality with Other Labs and Experiments and Use of Standards

With respect to standards, it is important to realize that there are both de facto and de jure standards and that neither Fermilab nor even the DOE is in a position to determine either. Certain decisions and strategies have already been adopted by the Run II management that acknowledge this. The adoption of cvs as the code management system is an example of the use of a de facto standard and the adoption of the KAI CC compiler is in anticipation of a de jure standard, as soon as ISO accepts the language committee's recommendation, designated ISO/IEC FDIS 14882. As their press release points out, the FDIS stands for "Final Draft International Standard." At present, it appears that both of these decisions were wise and have already begun to pay dividends, the first clearly more than the second so far.

In fairness, it should be noted that CERN has reached a very different conclusion after starting with a very similar premise. Based on their own experience, CERN has chosen to work with the vendor compilers wherever possible and accept vendor-to-vendor variation, known to be large in the case of C++ code. This was driven in part by heavy reliance on third-party software (such as the NAG library) developed under the "native" compilers.

At the same time, certain other strategies, based on accepting developments at other DOE labs rather than on industry standards, appear less than stunningly successful. Two examples come to mind. A substantial amount of effort has been invested in HPSS as the event data storage and management system. It now appears there is a good probability that effort will have been wasted and that another solution needs to be found. Another example is the adoption of SoftRelTools, developed by the BaBaR collaboration at SLAC, instead of the more widely known and used imake from the X11 consortium or autoconfigure from the Free Software Foundation. The wisdom of this choice is not completely clear yet so little can be learned, but at least one of us (JMM) already believes that decision was a mistake, at least in part.

Based on these few, admittedly incomplete cases, it is possible to offer a tentative generalization. Where there is an industry standard, adopt it even if some other lab has not. Where there is no acceptable industry standard but some sister lab or major experiment has developed a tool that survives critical inspection, adopt it.

Portability

The selected analysis software must be able to run on both desktop systems and from centrally available servers. It is desireable to move the analysis task to the computers hosting the appropriate data sets. While for much analysis work this computer may well be the desktop machine and a local working cache, significant work will be done by taking such analysis code and running it against large, potentially very large, data sets on central computing facilities. Current platforms of interest are SGI, Linux, Windows NT, Digital Unix, AIX and Solaris. The ideal package would support all of these platforms but at least one of: Linux and Windows NT and two of: SGI IRIX, Digital Unix, IBM AIX and Sun Solaris must be supported. Demonstated ability to port the analysis code to new OS versions and platforms is a benefit.

Scaleability

The analysis software must be able to gracefully scale from analysis of a handful of input data files (<10GB) to analyses run over several hundred (if not thousands) of input data files. Any optimizations based on having data sets being resident in (real or virtual) memory must be able to be disabled and cannot have tremendous impact on tool function for datasets exceeding memory capacities. The software must be configurable to enable many tens (~100) of simultaneous users on large central server machines. Machine resources (memory, CPU, network bandwidth) required by the analysis processes should be well managed and well suited to the likely configurations of central servers and desktops. It is highly desireable that there be simple facilities for running analysis jobs in parallel and then combining the individual results in a statistically correct manner.

Performance

The analysis software must be able to do simple presentation (eg. 2D histograms of files with an event mask) at disk speed (~3MB/s input). Plot manipulations and presentation changes of defined histograms must be rapid and introduce no noticeable delays to the user. Performance penalties for user supplied code (eg. routines from reconstruction code) must not be more than a factor of 2 over native (unoptimized) compiled code run standalone.

User Friendliness

Learning to utilize the software to the level of reading in a file of number pairs, and plotting the result should not take a competent physicist more than 4 hours to do. Evaluators should be able to become proficient to the level of defining an input stream, performing a moderate selection/analysis procedure including user supplied code and producing a result suitable for presentation within 2 weeks. Manuals must be lucid, complete, affordable and available. Software application presentation and interface should be common to all supported platforms and data and kumac-like-recipes must be easily exchangable between users using all platforms. Support for detailed questions about internal operations of the software on data, numerical methods, API formats and requirements, and output formating (both data and plots) must be available preferably directly to the users, but at least to a moderate (~10) number of "experts" from each experiment. The software must be configurable to remember users preferences and customizations and allow for multiple levels of customization (eg. user, working group, collaboration, Lab) for local definitions (eg. printers) and enhancements.