DZero Breaks New Ground
in Global Computing Efforts
by Kurt Riesselmann
Searching for subatomic particles very much resembles the often-cited search for the needle in the haystack. Since the beginning of Collider Run II in March 2001, DZero scientists have collected more than 550 million particle collisions. The data fill five stacks of CDs as high as the Eiffel tower—storage cases not included. And the (hay)stacks are growing every day.
"The Fermilab farms can process four million events per day," said Mike Diesburg, who manages a cluster of 600 PCs for the DZero experiment at Fermilab. "That's enough to handle the daily flow of incoming events."
Yet when the DZero collaboration decided to re-examine the entire set of collision data, encompassing more than 500 terabytes, scientists had to look for computing power beyond Fermilab. For the first time ever, DZero scientists had to send actual collision data—the crown jewels of their experiment—off site.
"In the past, DZero and other particle physics collaborations have used remote computing sites to carry out Monte Carlo simulations of their experiments," said DZero scientist Daniel Wicke, University of Wuppertal, Germany. "We are now one of the first experiments to process real collision data at remote sites. The effort has opened up many new computing resources for our collaboration. The evaluation of our experience will provide valuable input to the worldwide development of computer grids."
*computing power equivalent to 1 GHz Pentium III processors
The reprocessing of the DZero collision data, coordinated by Diesburg and Wicke, so far involves computing resources in six countries: Canada, France, Germany, the Netherlands, the United Kingdom and the United States. (Many other countries contribute to the computing of simulated DZero data and the analysis of processed data.) From November to January, DZero groups in each of the six countries had access to local PC clusters and Grid networks, ranging from one hundred to more than one thousand PCs.
"In the UK, the software installation, submission and monitoring of jobs was done centrally for all participating UK sites in a grid-like manner," said Gavin Davies at Imperial College London. "The machines at Imperial College, for example, are shared across the whole College, so it takes grid software to keep it all running smoothly."
The largest amount of off-site computing took place at the Centre de Calcul in Lyon, France, which reprocessed 36 million collisions.
"Reprocessing involves large volumes of data to be transferred in both directions on a scale that was simply unthinkable a few years ago," said Patrice Lebrun, IPN Lyon. "It will open new possibilities that we are only beginning to explore."
To provide participating computer systems with collision data, the DZero collaboration relied on the SAM software developed at Fermilab. The Sequential Access Manager is essentially a catalog of all the DZero data, and it transfers data on demand. Wyatt Merritt, who is a co-leader of the SAMGrid project at Fermilab, explained the process.
"If a DZero scientist submits a job to the computer system in Karlsruhe, Germany, it may need a particular set of data files," she said. "If those files are not in the local system, the SAM software will automatically determine where they are and retrieve them. With the SAM software, a user doesn't need to know whether the data is stored on tape or on disk, whether it is located at Fermilab or at Karlsruhe."
Although the DZero collaboration has automated the global tracking and transfer of data, the reprocessing of data does not yet represent a full, global Grid. So far, DZero scientists manually assign computing jobs to specific clusters and local grids. However, scientists at the NIKHEF laboratory in Amsterdam made great progress.
"We have been able to show that we can really use the LHC [Large Hadron Collider] Computing Grid for DZero processing," said Kors Bos, who leads the Dutch computing efforts. "We saw jobs submitted from Wuppertal being executed on our CPUs, and we executed jobs in Karlsruhe, at Rutherford Appleton Laboratory and a few more places."
Wuppertal's Wicke praised these efforts.
"The group at NIKHEF has pushed the Grid concept the most," he said. "They have devoted themselves to running DZero computing jobs on generic computers that have no prior knowledge of DZero programs and data bases. When their efforts pay off, then we can run our DZero jobs on any computer cluster in the world."
The DZero collaboration conducted the reprocessing of all Run II data to improve, among other things, the identification of particle tracks. Raw data contain track information in the form of a vast collection of disconnected points. To connect the right dots, scientists use sophisticated track reconstruction programs. Until recently these programs relied on the theoretical design of the DZero detector rather than its real-world performance.
"The new algorithm is based on our knowledge of how well we put the detector together," said Dugan O'Neil, one of the DZero scientists working with the WestGrid in Vancouver, Canada. "This has dramatically improved our efficiency of finding particle tracks."
The collaboration also has adopted the new algorithm to process all new experimental data. Yet the collaboration expects to carry out another reprocessing of all Run II data, old and new, in less than a year, applying further refined analysis tools to the raw data. The new round of reprocessing will require even more off-site computing power, providing ample of opportunity to further develop the Grid system.
"You can't make the Grid work without motivation," said O'Neil. "It's one thing to have a vision, and it is another thing to stay up to three in the morning to make things work because they need to get done. DZero is a real application. We need to get the physics results out."
On the Web:
Reprocessing of DZero Run II data:
|last modified 2/6/2004 email Fermilab|