Preserving the data harvest
Canning, pickling, drying, freezing-physicists wish there were an easy way to preserve their hard-won data so future generations of scientists, armed with more powerful tools, can take advantage of it. They've launched an international search for solutions.
When the BaBar experiment at SLAC National Accelerator Laboratory shut down in April 2008, it brought an end to almost nine years of taking data on the decays of subatomic particles called B mesons. But that was hardly the end of the story for the 500 scientists working on the experiment. In November they celebrated the publication of their 400th paper, and they expect the next few years will yield at least 100 more.
These BaBar results and discoveries stem from more than two million megabytes of data. As impressive as this number is, it's only a fraction of the data that will come out of the next generation of high-energy physics experiments. For instance, the ATLAS detector at CERN's Large Hadron Collider will produce a whopping 320 megabytes of data every second, surpassing BaBar's total output within three months.
BaBar's treasure trove of data, which may contain answers to questions we don't even know how to ask yet, raises an increasingly important question in highenergy physics: When the party's over, what do you do with the data?
In the past, this was not so much of a concern. New experiments came along in a regular drumbeat, regularly superseding one another in terms of what could be done with the data they produced. Today, as experiments get bigger, more complex, and much more expensive, the drumbeat has slowed considerably, and physicists are starting to realize the value of wringing as much insight out of every experiment as they possibly can.
But without a conscious effort to preserve them, data slowly become the hieroglyphs of the future. Data preservation takes a lot of work, and with that, a lot of resources. Researchers have to think not only about where to store the data, but also how to preserve it in a way that it can still be used as technology and software change and experts familiar with the data move on or retire.
Read more
|