Rob Gardner, UChicago
May 09, 2016
ERC 401

Title: Leadership Cyberinfrastructure for Science and the Humanities
Speaker: Rob Gardner, U.Chicago


China's Milky Way 2 supercomputer was recently declared the fastest supercomputer in the world by industry scorekeeper Top500, the latest move in the increasingly international race for high performance computing supremacy. Late last month, CI Senior Fellow Rick Stevens appeared on Science Friday, alongside Top 500 editor Horst Simon, to talk about why that competition matters, and what the global push for faster computation will do for medicine, engineering and other sciences.

"These top supercomputers are like time machines," Stevens said. "They give us access to a capability that won't be broadly available for five to ten years. So whoever has the time machine is able to do experiments, able to see into the future deeper and more clearly than those that don't have such machines."


We were thrilled to spend Friday morning with the folks at TEDxCERN via webcast, enjoying fascinating talks by CI director Ian Foster and several other amazing scientists and educators. Foster's talk focused on "The Discovery Cloud," the idea that many complex and time-consuming research tasks can be moved to cloud-based tools, freeing up scientists to accelerate the pace of discovery. We'll post the video when it's up, but for now, enjoy this great animation produced for the conference by TED-Ed explaining grid computing, cloud computing and big data.



People who work in laboratories take a lot of things for granted. When they come into work in the morning, they expect the equipment to have power, the sink to produce hot and cold water, and the internet and e-mail to be functional. Because these routine services are taken care of "behind the scenes" by facilities and IT staff, scientists can get started right away on their research.

But increasingly, scientists are hitting a new speed bump in their day-to-day activities: the storage, movement and analysis of data. As datasets grow far beyond what can easily be handled on a single desktop computer and long-distance collaborations become increasingly common, frustrated researchers find themselves spending more and more time and money on data management. To get the march of science back up to top speed, new services must be provided that make handling data as simple as switching on the lights.


The original Human Genome Project needed 13 years, hundreds of scientists and billions of dollars to produce the first complete human DNA sequence. Only ten years after that achievement  genome sequencing is a routine activity in laboratories around the world, creating a new demand for analytics tools that can grapple with the large datasets these methods produce. Large projects like the HGP could assemble their own expensive cyberinfrastructure to handle these tasks, but even as sequencing gets cheaper, data storage, transfer and analysis remains a time and financial burden for smaller labs.

Today at the Bio-IT World conference in Boston, the CI's Globus Online officially unveiled their solution for these scientists: Globus Genomics. Per the news release, "integrates the data transfer capabilities of Globus Online, the workflow tools of Galaxy, and the elastic computational infrastructure of Amazon Web Services. The result is a powerful platform for simplifying and streamlining sequencing analysis, ensuring that IT expertise is not a requirement for advanced genomics research."



Big science projects can afford big cyberinfrastructure. For example, the Large Hadron Collider at CERN in Geneva generates 15 petabytes of data a year, but also boasts a sophisticated data management infrastructure for the movement, sharing and analysis of that gargantuan data flow. But big data is no longer an exclusive problem for these massive collaborations in particle physics, astronomy and climate modeling. Individual researchers, faced with new laboratory equipment and methods that can generate their own torrents of data, increasingly need their own data management tools, but lack the hefty budget large projects can dedicate to such tasks. What can the 99% of researchers doing big science in small labs do with their data?

That was how Computation Institute director Ian Foster framed the mission at hand for the Research Data Management Implementations Workshop, happening today and tomorrow in Arlington, VA. The workshop was designed to help researchers, collaborations and campuses deal with the growing need for   high-performance data transfer, storage, curation and analysis -- while avoiding wasteful redundancy.

"The lack of a broader solution or methodology has led basically to a culture of one-off implementation solutions, where each institution is trying to solve their problem their way, where we don't even talk to each other, where we are basically reinventing the wheel every day," said H. Birali Runesha, director of the University of Chicago Research Computing Center, in his opening remarks.