Recently, genomics has become of the major fronts in the "War on Cancer," with researchers around the world collecting increasingly sophisticated and detailed genetic data for different subtypes of the disease. But as genetic sequencing grows cheaper and more common, scientists are faced with new logistical problems of how to store, share, and access this data across institutions to answer scientific questions and get closer to new understanding and treatments for the disease. A new project funded by the National Cancer Institute and led by CI senior fellow and faculty Robert Grossman will gather these disparate resources into a Genomic Data Commons (GDC), storing and "harmonizing" cancer data and facilitating new discoveries.
The commons will centralize NCI datasets -- including results from both genomic and clinical studies -- into a single resource, similar to data strategies used by major tech companies such as Google or Facebook. That's no small task, as according to an Institute of Medicine report, existing cancer genomic data is already as large as 20 petabytes -- "10 times as much as all of the publications currently housed in U.S. academic research libraries," the UChicago news release by Kevin Jiang estimates. But creating this commons will greatly simplify researchers' access to this data, requiring only one approval instead of negotiating access with multiple institutions to view and use spread-out datasets.
“The Genomic Data Commons has the potential to transform the study of cancer at all scales,” said Grossman, a professor of medicine at UChicago. “It supplies the data so that any researcher can test their ideas, from comprehensive ‘big-data’ studies to genetic comparisons of individual tumors to identify the best potential therapies for a single patient.”
“With the GDC, the pace of discovery shifts from slow and sequential to fast and parallel,” said Conrad Gilliam, dean for basic science at UChicago's Biological Sciences Division and CI Senior Fellow. “Discovery processes that today would require many years, millions of dollars and the coordination of multiple research teams could literally be performed in days, or even hours.”
The Genomic Data Commons will be built upon the Bionimbus Protected Data Cloud, another Grossman project which was the first cloud-based system approved to store data from The Cancer Genome Atlas. Once constructed, it may serve as a model to construct similar centralized, collaborative data frameworks for other diseases, including Alzheimer's and diabetes.
“The availability of high-quality genomic data and associated clinical annotations is extremely important because this information can be combined and mined repeatedly to make new discoveries,” said Louis Staudt, director of NCI’s Center for Cancer Genomics.
For more on the Genomic Data Commons, read the UChicago news release or a story in the Chicago Tribune. You can also watch a video on the project, featuring Grossman and CI Senior Fellows Kevin White and Nancy Cox.