Fermi National Laboratory

Volume 25  |  Friday, September 20, 2002  |  Number 15
In This Issue  |  FermiNews Main Page

Computing by the Truckload
Lab’s largest-ever single purchase of computers boosts physics analysis and challenges infrastructure

by Mike Perricone

Just for a moment, put aside the technical questions of how the recent $800,000 purchase of more than 400 PCs will be used to advance scientific research at Fermilab’s DZero experiment during the quest for new physics at Collider Run II of the Tevatron.

Focus on the logistical issues involved in this largest single purchase of computers (by number of units) in lab history:

What do you do when 400 computers show up on a truck? How and where do you make room for them, even without monitors and keyboards? What do you do with more than 400 boxes, and with all the styrofoam and the little plastic bags inside? Where do you keep all these packing materials in case you have to send stuff back during the 30-day trial period?

And after you have them unpacked, where do you plug in all these computers? Do you even have 400 electrical outlets? Do you start them all up at once? Will that implode all your circuit breakers? And those 400 little fans blowing out heat from the backs of the CPU cases—will they turn your air-conditioned data center into a sauna?

When some of the 400 inevitably fuss, who fixes them? Who backs up the files?

“Building a data center today is a formidable challenge,” said Gerry Bellendir of Fermilab’s Computing Division, who has helped coordinate the process of getting the computers into the lab and getting them installed. The final count was 434 computers: 400 at the Feynman Computing Center and another 34 for the online system at DZero.

“People thought data centers would go away when desktops were introduced,” said Bellendir, whose lab service dates back to 1969. “But people want to use computers like a radio or telephone. They don’t want to install new systems, update software, back up files. And the enormous amounts of data from the experiments must be maintained in an airconditioned environment, with fire protection systems.”

Atipa representatives installed the units in the racks at Feynman Computing Center. Computing administrators booted up the systems, one unit at a time, and began the 30-day “burn-in” with a suite of software tools designed to stress the various hardware components (CPU, memory, disk, network).

First comes the investment of effort involved in the ordering, receiving, checking, and moving the shipment to the Feynman Center.

Bellendir emphasized the pivotal contributions of Fermilab’s shipping, receiving, warehousing and property departments. All the receiving, unpacking, checking, tagging and repacking was done at Site 38, the lab’s shipping and receiving center. In addition, all the empty boxes (and styrofoam, and cardboard inserts and little plastic bags) were stored for the 30-day trial of “burn-in” period. Combustible materials are not permitted in the computing rooms at Feynman.

The computers—Atipa Technologies Athlon 1.67 GHz dual CPUs—will process the data and prepare experimental results for analysis by DZero collaborators. The collection of machines will be used to run many parallel jobs. The 400 dual CPUs at the Feynman Center produce close to the effect of 800 computers in 400 housings. At Feynman, they are being stacked in 25 racks, each holding 16 units in six square feet of floor space, with 240 on the second floor and 160 on the first floor. More comparatively large-scale purchases are coming: 240 for the CDF collaboration, and 72 for the Tier I computing center of U.S/CMS, located at Fermilab. The Compact Muon Solenoid detector (CMS) will operate with the Large Hadron Collider at CERN in Geneva, Switzerland.

Cables and cooling ducts were installed before the computers arrived. Bellendir said that while miniaturization puts more computing power into a smaller footprint, there is a cost. These installations, for example, will boost power and cooling requirements by 50 percent at the Computing Center.

“It’s one of the main problems facing data centers today,” Bellendir said.

The increased need for computing power reflects the geometric expansion of data in high-energy physics experiments, with DZero entering the physics analysis phase full force for Run II—the impetus for this purchase. Wyatt Merritt, head of DZero Computing and Analysis in the Computing Division, has seen computing become an increasing share of experiment hardware, with continual additions and upgrades beginning in the earliest stages of commissioning and testing the detector and its components.

“Then, when we reach the moment of truth where commissioning is complete and everything is running at or above design rates and sizes,” Merritt said, “we put in place the last bit of equipment needed—and then immediately start to replace the first bits we bought, because at least some parts of the computing plant have a useful lifetime of less than five years.”

Now, while at their peak potential, the majority of the computers the 260 computers on the second floor of Feynman) will be used largely to reconstruct the data from particle collisions at the same pace they are witnessed by experimenters. The raw data consists of independent events that are collected and written to large files, with each file then sent to a PC for reconstruction. These reconstructions are used to identify candidates for electrons, photos, jets and muons; to determine their location within the detector and measure their energy and/or momentum. They are also used to find “missing energy,” which indicates the presence of neutrinos in the detector. The 140 computers on the first floor are for user analysis: applying the output of the reconstruction for experimenters (users) to examine and select the data samples they need for their areas of physics analysis. The 34 units at DZero will provide additional computing power for online event selection as the Tevatron luminosity increases during the run.

From buying to burn-in

The first step in obtaining 434 PCs is ordering them—after deciding what’s needed, that is.

Lisa Giacchetti of Computing Division’s Operating System Support Department/Scientific Computing Support coordinated the process of compiling a Request for Bid document specifying requirements of the systems (type and speed, memory, quantity and capacity of disk, etc.), rack requirements, network and serial connection wiring requirements and more. The request went out to PC vendors who have passed Fermilab qualifications. Giacchetti also coordinated components needed for installing and running the machines: power, networking, floor space, receiving, tagging, safety. Atipa Technologies won the bid.

Once the order was shipped, receiving the large number of PCs meant enlisting the help of receiving, warehouse and property staffers. The entire receiving operation was conducted not at Feynman Computing Center but at Site 38. Each PC was removed from its box, inspected for damage, tagged, loaded 16 to a skid, shrink-wrapped, and held at Site 38 until the Computing Center had room to bring them over. Computing’s Equipment Logistics Services group also spent a great deal of time helping at Site 38.

The PCs had an acceptance period of 30 days, and all the boxes and packing materials were held at Site 38. Not only was there no room at Feynman, but the computing center will no longer allow boxes or skids, or other combustibles, into the computer room. Once the PCs—not household types, but still valuable commodities—were moved to Feynman, they were secured overnight in locked cages. Atipa representatives installed the units in the racks. Computing administrators booted up the systems, one unit at a time, and began the 30-day “burn-in” with a suite of software tools designed to stress the various hardware components (CPU, memory, disk, network). The computers must meet specifications or units can be returned.

Harold Scheppman of Support Services gets things moving at Site 38. Alex Hernandez of Computing Division’s Equipment Support unpacks PCs and inspects them for damage. Dennis McAuliff of Support Services identifies and tags individual PCs. Keith Coiley of Computing Division lends a sense of scale to stacks of repacked and shrink-wrapped PCs.

“We delayed buying machines for as long as possible to get the maximum computing power within our budget,” said Amber Boehnlein, co-leader of DZero Software and Computing. “Experimenters need to look at the data quickly, identify and fix any problems in the detectors or the software as quickly as possible, and make sure that our physics goals are met by finishing analyses in a timely manner. The amount of data we write to tape is increasing as the luminosity improves.”

To keep the data flowing, the power must keep flowing. The Computing Center recently added a new array of Uninterruptible Power Supplies, and installed a generator capable of supplying the entire building and all its needs in the event of a power outage at the laboratory. The infrastructure improvements were made under the lab’s Utilities Incentives Plan, a federal program that allows investments of funds and expertise by utility companies, with savings from the improvements used to pay back the utilities’ initial investments.

How to keep the data flowing in the future, throughout Run II, is a question under study. The Computing Division believes it has the resources to ensure smooth running through FY’05, but has commissioned a study to examine alternatives. Beyond that time, Fermilab’s Associate Director for Operations Support, Jed Brown, has commissioned a working group of computing and the lab’s Facilities Engineering Services Section to formulate a 10-year plan for infrastructure supporting computers.

“We also participate in a consortium called the Uptime Institute, which is dealing with these types of issues,” Bellendir said. “We’re trying to stay in tune with the industry and where it’s going. But we don’t know what the technology will be two or three years from now. How do we estimate what’s 10 years away? That’s the question we’re all facing.”


On the Web:
The Uptime Institute
www.upsite.com/TUIpages/tuihome.html
Fermilab Computing Division
www.fnal.gov/cd/


last modified 9/17/2002   email Fermilab