Big science projects can afford big cyberinfrastructure. For example, the Large Hadron Collider at CERN in Geneva generates 15 petabytes of data a year, but also boasts a sophisticated data management infrastructure for the movement, sharing and analysis of that gargantuan data flow. But big data is no longer an exclusive problem for these massive collaborations in particle physics, astronomy and climate modeling. Individual researchers, faced with new laboratory equipment and methods that can generate their own torrents of data, increasingly need their own data management tools, but lack the hefty budget large projects can dedicate to such tasks. What can the 99% of researchers doing big science in small labs do with their data?
That was how Computation Institute director Ian Foster framed the mission at hand for the Research Data Management Implementations Workshop, happening today and tomorrow in Arlington, VA. The workshop was designed to help researchers, collaborations and campuses deal with the growing need for high-performance data transfer, storage, curation and analysis -- while avoiding wasteful redundancy.
"The lack of a broader solution or methodology has led basically to a culture of one-off implementation solutions, where each institution is trying to solve their problem their way, where we don't even talk to each other, where we are basically reinventing the wheel every day," said H. Birali Runesha, director of the University of Chicago Research Computing Center, in his opening remarks.