The Heroic Task of Making Data Routine

23
Apr
2013

People who work in laboratories take a lot of things for granted. When they come into work in the morning, they expect the equipment to have power, the sink to produce hot and cold water, and the internet and e-mail to be functional. Because these routine services are taken care of "behind the scenes" by facilities and IT staff, scientists can get started right away on their research.

But increasingly, scientists are hitting a new speed bump in their day-to-day activities: the storage, movement and analysis of data. As datasets grow far beyond what can easily be handled on a single desktop computer and long-distance collaborations become increasingly common, frustrated researchers find themselves spending more and more time and money on data management. To get the march of science back up to top speed, new services must be provided that make handling data as simple as switching on the lights.

That mission was the common thread through the second day of the GlobusWorld conference, an annual meeting for the makers and users of the data management service, held this year at Argonne National Laboratory. As Globus software has evolved from enabling the grids that connect computing centers around the world to a cloud-based service for moving and sharing data, the focus has shifted from large, Big Science collaborations to individual researchers. Easing the headache for those smaller laboratories with little to no IT budget can make a big impact on the pace of their science, said Ian Foster, Computation Institute Director and Globus co-founder, in his keynote address.

"We are sometimes described as plumbers," Foster said. "We are trying to build software and services that automate activities that get in the way of discovery and innovation in research labs, that no one wants to be an expert in, that people find time-consuming and painful to do themselves, and that can be done more effectively when automated. By providing the right services, we believe we can accelerate discovery and reduce costs, which are often two sides of the same coin."

The inspiration for Globus Online, Foster said in his keynote speech, was drawn not from science but from entertainment and business. Flickr and Netflix have changed the way people manage their photos and watch movies online, while Gmail and Dropbox are popular tools for accessing e-mails and files from multiple computers. All of these applications are "software-as-a-service," requiring no installation, invisibly hosted in the cloud and accessible through the browser of any internet-connected computer. Globus Online was born in 2010 when its founders asked whether research data workflow could be managed with the simple user interface and cloud infrastructure these services employ. As co-founder Carl Kesselman of USC put it in his talk, why can't we manage our data over the internet as easily as we organize our pictures of cats?

The first step for Globus Online was to provide reliable, secure and high-performance file transfer -- a sort of "dropbox for science," as Foster put it. But instead of moving data into cloud storage that can then be accessed from multiple computers, Globus Online allows for direct transfer from endpoint to endpoint. This feature can help researchers move terabytes of data when they switch jobs, share large datasets among multiple sites of a multi-institutional collaboration or back up data to dedicated storage servers.

The case studies presented at GlobusWorld demonstrated the usefulness of these seemingly simple tasks. Lee Taylor from the University of Exeter described how the service is used by his institution's new 1-petabyte repository, where researchers across disciplines are encouraged to store and share their data. Matthias Hofmann from TU Dortmund described how Globus tools were placed into the workflow of two major projects on cancer biomarkers and neuron behavior, two scientific areas that must manage growing torrents of data. David Skinner of the National Energy Research Scientific Computing Center talked about how these file transfer capabilities are helping astronomers, botanists and the Advanced Light Source and Advanced Photon Source facilities grapple with increasing data demands.

GlobusWorld 2013 was also a coming-out party for the newest member of the Globus family: Globus Genomics. Announced the week before at Bio-IT World, Ravi Madduri and Dina Sulakhe gave the crowd a full introduction to the service that combines Globus Online's file transfer with the genetic analysis techniques of Galaxy and the flexible computation of Amazon Web Services. Already, the service is used by laboratories looking for genes associated with cancer risk and brain disorders, who previously were slowed down by shipping disk drives full of genomic data to and from sequencing and analysis centers.

Foster cited Globus Genomics as an example of Globus Online's broader vision: to provide a full-service platform that researchers can use to not only move data, but easily organize, analyze and share it  as well. For a scientific future where the flow of data will be almost as important as the flow of water, automating these mundane tasks as much as possible will leave more time for actually gathering that data and asking the questions that will drive discovery.

"That enables new science," Skinner said. "That's how you get to ask new questions: by obsoleting the old ones, and by taking an activity that used to be heroic and making it commonplace...that is the guts of science."

Written By: 
Research Tags: