Deconstruction: Big data
||Big science takes both big data and big cooperation. For the Large Hadron Collider at CERN, storing, analyzing and accessing 25 petabytes of data each year requires a worldwide effort that spans more than 100 institutions in 36 countries. Here's how it works.
The Large Hadron Collider, the world's largest particle accelerator, produces a million gigabytes of data every single second. It's an incredible amount of information—too much for any single institution or computing center to handle.
Fortunately, out of the billions of collisions produced, only a fraction of the data is scientifically interesting enough to keep. Imagine searching for a needle in a haystack. Now imagine searching for a couple of needles in a football field full of haystacks. It takes a while to find the needles, but once you do, there's no need to keep the hay.
In all, CERN keeps about 25 petabytes—that's about 26,000,000 gigabytes—of data per year for physicists to analyze, but even that's more data than all of the text available in the American Library of Congress, multiplied by a thousand.
It's just not realistic for one facility to house and analyze that much information, so to share the load, CERN outsources some of the data storage and processing to more than 150 computing centers all around the world via the Worldwide LHC Computing Grid.
Once each experiment at the LHC decides which collisions are interesting enough to keep, it stores one complete copy of those raw data at CERN, while also dividing the same data among 11 "Tier 1" centers in Asia, Europe and North America. At CERN ("Tier 0") and at these Tier 1 centers, collision events are reconstructed from the raw data. The reconstructed events are then stored at both the Tier 0 and Tier 1 centers.
The United States is home to two Tier 1 computing centers, with ATLAS experiment data making its way to Brookhaven National Laboratory in Upton, New York, and CMS experiment data to Fermi National Accelerator Laboratory in Batavia, Illinois.