The ROOT of the Matter
CDF and DZero detector experiments will try out new data analysis software in Run II
by Mike Perricone
Bit by bit by bit, through hundreds of thousands of electronic channels, high-energy physicists watch the information generated by their experiments growing enormously.
When it comes to making sense of their information, to sifting through it and finding what theyre after, theres only one useful strategy:
Less is more.
The information from an experiment must pass through narrower and narrower electronic funnels, until those final and useful bits of information serve to address the critical questionsDid we find what we expected? Did we find something unexpected? What do we do next?
For the Collider Run II of the Tevatron, slated to begin sometime in the first quarter of the year 2000, the 5,000 tons of tracking equipment housed in each of the Collider Detector at Fermilab and the DZero detector will probably handle 250 kilobytes of data per collision event and 10 megabytes of data per second, storing up to 200 terabytes of raw data and producing up to 80 terabytes of data for a years worth of final physics analysis.
That total is at least 10 times the data generated in Run I of the Tevatron. A less conservative (and probably more accurate) estimate could grow to at least 20 times the Run I data.
How much information is that?
If your laptop computers hard drive has a capacity of 1 gigabyte, you would need more than 80,000 laptops to handle 80 terabytes of data. That would mean every person in Fermilabs neighboring cities of Batavia and Wheaton working on a 1-gigabyte laptop of his or her own, just to record a years worth of data for final analysis.
"Its simply a volume increase: theres more and more data to handle," said Steve Wolbers, deputy head of Fermilabs Computing Division and member of a two-year collaboration between Computing and the two experiments to reach a consensus on questions involving data acquisition and analysis.
"How people actually analyze their data, how they carry out their physics analysis, is another component of the issue," Wolbers continued. "There are different approaches, and the approach were taking at Fermilab is, How do we best get our work done here? Thats the kind of discussion weve been having."
Against the background of an ongoing quest for that best way to work, a collaboration drawn from CDF and DZero, the Computing Division and the Run II Joint Computing Project Group has settled on a newly-developed system called ROOT to handle data at the detectors for the next two yearsa limited adoption with options open, to paraphrase Wolbers.
"This is a tool being adapted for use by CDF and DZero for Run II for the short term," said Ruth Pordes, associate head of the Computing Division and a task coordinator of the CDF/DZero/CD Joint Offline Project.
"The Laboratory is not putting its weight behind (ROOT)," Pordes continued, "but the experiments are using it because it is pragmatic and offers what they want. In the short term over the next two years, in the critical commissioning and initial data taking stages, they will need very functional and quickly adaptable tools."
Pordes recently coordinated the organization of a ROOT workshop at Fermilab, with more than 60 attendees from an array of experiments at other laboratories. Also on hand was the team developing and implementing the systemRené Brun of CERN, also the author of the widely-used data tool PAW (Physics Analysis Workstation); Fons Rademakers of CERN; and Masa Goto of Hewlett-Packard/Japan, who developed an important translation program for computer languages that was incorporated into ROOT. Fermilab participants included Philippe Canal, who does much of the local support for the software; Scott Snyder of DZero, who has made widely-accepted extensions of Masa Gotos software; and Rob Kennedy of CDF, who is incorporating core pieces of ROOT functionality into the local data handling software.
Pordes said the direct involvement of the development team was important to the transition from current systems, based on the long-established FORTRAN computer code, to the more recent C++ code, which evolved from C. The C-based codes are gaining commercial support, while commercial support for FORTRAN appears to be eroding.
"ROOT looks like an interesting product," said Joel Butler, former head of Fermilabs Computing Division, and a member of the joint group assessing the experiments data needs. "There are other issues, like how flexible and maintainable it is. It will be interesting to see what happens."
In an object-oriented data system, such as ROOT, a segment of data (an "object") is encapsulated with the routines or methods that operate on that data; in other words, the information knows how to process itself when a user retrieves it. Canal explained that an object-oriented system allows access to higher levels of organization in retrieving data.
"One of the advantages is that we encapsulate more and more of the information," Canal said, "so that each user needs to know less about the details of how a chunk of data is implemented, and can focus more on the higher level concepts of using the data. The concept is to merge both data and functionality."
Butler emphasized that making data simple to store and retrieve is increasingly important to experimenters, because users and programs are increasingly approaching the data in different ways from different locations. The object-oriented approach offers a good match because of the merging of data and functions, bypassing the need for strict adherence to a particular method of access.
"More and more programs function collaboratively with each other," Butler explained, "but they also function asynchronously. If a user types something, he or she wants something to happen. The object-oriented model allows components to interact more freely, because it does not require a strict ordering of the interactions by the machine."
Canal stressed that to an experimenter, "data is the only thing you have of value, and you want to be able to put your hands on it." That usually means developing new tools.
Experimenters cant use programs off the shelf from the local computer store, because of the sheer size of the data involved in their experiments (remember those 80,000 laptops). That much data cant simply be stored in a computers memory; much of it has to go on remote disks, and Pordes pointed out that commercial tools generally work only with data stored in memory.
"We need software to support some of the data being on disk and some being in memory," Pordes said. "Tools like PAW and ROOT support that, but commercial tools typically dont."
It is not unusual for physics experiments to adopt their own software, separate from the systems used generally throughout a Laboratory. Though its predecessor, PAW, is widely used in physics, ROOT is very much an individual product, started in the context of the NA49 experiment at CERN in 1995.
Commercial tools are not currently favored as solutions, but they are not being ruled out of the search at Fermilab, CERN and other labs, Wolbers said. He indicated that the less special attention an application needs, the more generally useful it would become.
"Software is something experiments can support on their own," he explained, "but sooner or later a software choice could involve a significant support load from the Laboratory. Were also looking for commercial products that might be modified or used as-is. We want to leave our options open in case other applications prove to be superior. We also want to continue working with CERN and other labs so we dont come up with incompatible techniques. Hopefully, well be able to do something that applies to all of high-energy physics."
The search continues, rooted in the desire to link the greatest possible use with the lowest possible maintenance. For that final analysis, providing the best tools will require computer art as much as computer science.
|last modified 4/30/1999 email Fermilab|