30
Apr
2013

When Charles Darwin took his historic voyage aboard the HMS Beagle from 1831 to 1836, "big data" was measured in pages. On his travels, the young naturalist produced at least 20 field notebooks, zoological and geological diaries, a catalogue of the thousands of specimens he brought back and a personal journal that would later be turned into The Voyage of the Beagle. But it took more than two decades for Darwin to process all of that information and into his theory of natural selection and the publication of On the Origin of Species.

While biological data may have since transitioned from analog pages to digital bits, extracting knowledge from data has only become more difficult as datasets have grown larger and larger. To wedge open this bottleneck, the University of Chicago Biological Sciences Division and the Computation Institute launched their very own Beagle -- a 150-teraflop Cray XE6 supercomputer that ranks among the most powerful machines dedicated to biomedical research. Since the Beagle's debut in 2010, over 300 researchers from across the University have run more than 80 projects on the system, yielding over 30 publications.

24
Apr
2013

This week, some 25 cities around the world are hosting events online and offline as part of Big Data Week, described by its organizers as a "global community and festival of data." The Chicago portion of the event features several people from the Computation Institute, including two panels on Thursday:  "Data Complexity in the Sciences: The Computation Institute" featuring Ian Foster, Charlie Catlett, Rayid Ghani and Bob George, and  "Science Session with the Open Cloud Consortium" featuring Robert Grossman and his collaborators. Both events are in downtown Chicago, free, and you can register at the above links.

But the CI's participation in Big Data Week started with two webcast presentations on Tuesday and Wednesday that demonstrated the broad scope of the topic. The biggest data of all is being produced by simulations on the world's fastest supercomputers, including Argonne's Mira, the fourth-fastest machine in the world. Mira boasts the ability to 10 quadrillion floating point operations per second, but how do you make sense of the terabytes of data such powerful computation produces on a daily basis?

{C}

23
Apr
2013

People who work in laboratories take a lot of things for granted. When they come into work in the morning, they expect the equipment to have power, the sink to produce hot and cold water, and the internet and e-mail to be functional. Because these routine services are taken care of "behind the scenes" by facilities and IT staff, scientists can get started right away on their research.

But increasingly, scientists are hitting a new speed bump in their day-to-day activities: the storage, movement and analysis of data. As datasets grow far beyond what can easily be handled on a single desktop computer and long-distance collaborations become increasingly common, frustrated researchers find themselves spending more and more time and money on data management. To get the march of science back up to top speed, new services must be provided that make handling data as simple as switching on the lights.

19
Apr
2013

[This article ran originally at International Science Grid This Week. Reprinted with permission.]

In 2012, the United States suffered its worst agricultural drought in 24 years. Farmland across the country experienced a devastating combination of high temperatures and low precipitation, leading to the worst harvest yields in nearly two decades. At its peak, nearly two-thirds of the country experienced drought conditions according to the US Drought Monitor. Worse still, instead of an anomalous year of bad weather, 2012 may have provided an alarming preview of how climate change will impact the future of agriculture.

These warning signs make 2012 an ideal year for validating crop yield and climate impact models that simulate the effects of climate on agriculture. Current climate change models predict that global temperatures could rise several degrees over the next century, making hotter growing seasons the new norm and truly extreme seasons (like 2012) more commonplace.

"A world four degrees warmer than it is now is not a world that we've ever seen before," says Joshua Elliott, a fellow at the Computation Institute and the Center for Robust Decision-Making on Climate and Energy Policy. "Studying years like 2012 in detail can potentially be very useful for helping us understand whether our models can hope to accurately capture the future."

17
Apr
2013

If you received a surprisingly personalized e-mail or Facebook message from the Obama 2012 campaign, it was likely the product of the campaign's groundbreaking analytics tools. As chief scientist of that acclaimed team, Rayid Ghani helped bring the computational techniques of data-mining, machine learning and network analysis to the political world, helping the re-election campaign raise funds and get out the vote in powerful new ways. Now that Barack Obama is back in the White House, we are pleased to announce that Ghani is joining the University of Chicago and the Computation Institute. Here, he will shift his attention and expertise to even bigger goals: using data and computation to address complex social problems in education, healthcare, public safety, transportation and energy.

Though he only started on April 1st, Ghani already has a full plate, including a position as Chief Data Scientist at the Urban Center for Computation and Data and a role developing a new data-driven curriculum at the Harris School for Public Policy. But Ghani's most immediate project is The Eric and Wendy Schmidt Data Science for Social Good Fellowship, which hopes to train and seed a new community of scientists interested in applying statistics, data and programming skills to society's greatest challenges. We spoke to Ghani about his time with the campaign and plans for the future.

{C}

13
Apr
2013

Two major gifts will build momentum behind the University of Chicago's leadership in biomedical computation by assembling experts in the field and furnishing them with the tools to use "big data" to understand disease and solve today's health-related challenges.

These two gifts will fund related projects that are central to a much larger plan at the University that includes multiple data-driven discovery programs to improve health and medical care.

12
Apr
2013

WHEN TED MEETS CERN

We're happy to announce that Computation Institute director Ian Foster will be speaking at the first-ever TEDxCERN conference, to be held May 3rd at the particle physics laboratory in Geneva, Switzerland. The theme of the conference is "Multiplying Dimensions," and Foster will speak in the second session on the topic of "Big Process for Big Data." Other speakers include geneticist George Church, chemist Lee Cronin and philosopher John Searle. A webcast of the conference (hosted by Nobel Laureate George Smoot) will run on the TEDxCERN website, but the CI will also host a viewing party at the University of Chicago. Stay tuned for details, and enjoy the TEDxCERN animation on the origin of the universe -- one of five animations (including one on big data) that will premiere at the event.

{C}

11
Apr
2013

Computer graphics have greatly expanded the possibilities of cinema. Special effects using CGI (computer-generated imagery) today enable directors to shoot scenes that were once considered impossible or impractical, from interstellar combat to apocalyptic action sequences to fantastical digital characters that realistically interact with human actors.

09
Apr
2013

The original Human Genome Project needed 13 years, hundreds of scientists and billions of dollars to produce the first complete human DNA sequence. Only ten years after that achievement  genome sequencing is a routine activity in laboratories around the world, creating a new demand for analytics tools that can grapple with the large datasets these methods produce. Large projects like the HGP could assemble their own expensive cyberinfrastructure to handle these tasks, but even as sequencing gets cheaper, data storage, transfer and analysis remains a time and financial burden for smaller labs.

Today at the Bio-IT World conference in Boston, the CI's Globus Online officially unveiled their solution for these scientists: Globus Genomics. Per the news release, "integrates the data transfer capabilities of Globus Online, the workflow tools of Galaxy, and the elastic computational infrastructure of Amazon Web Services. The result is a powerful platform for simplifying and streamlining sequencing analysis, ensuring that IT expertise is not a requirement for advanced genomics research."

{C}

04
Apr
2013

Chicago's open data culture is picking up steam, with millions of lines of data available through the city's data portal and a growing community of civic-minded programmers eager to sculpt that information into useful applications.