The electric cars, manned spacecraft, and must-have devices of tomorrow will all be built with discoveries made today in materials science. But to find the alloys, nanomaterials, and polymers that will enable these future technologies requires scaling up how researchers store, share, analyze, and sift through the surge of materials data from academia, national facilities, and industry.
A new $700,000 grant from the National Science Foundation will fund a consortium of Midwestern universities to address these challenges and stimulate innovative materials science research. The Midwest Big Data Spoke for Integrative Materials Design (IMaD) connects experimental and simulation results from several research groups to broaden access to data and computational tools, encourage discovery and collaboration, and extract valuable new discoveries from materials science data.
Founding IMaD participants include Northwestern University, the University of Chicago, Argonne National Laboratory, the University of Illinois, the University of Michigan, and the University of Wisconsin. The consortium will also work with industrial partners in the aeronautics and automotive industries on developing new data-driven commercial applications for materials science datasets.
“The IMaD spoke will build bridges between materials science data sources so that we'll be able to link far more data than anyone has had access to before,” said co-primary investigator Ian Foster, Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago, Senior Scientist at Argonne, and Senior Fellow of the Computation Institute. “We’ll then work with various groups to apply machine learning and simulation methods to advance the goal of computationally-based design of materials.”
The effort supports the mission of the Materials Genome Initiative, launched by the White House in 2011 to accelerate the pace of discovery, deployment, and manufacture of advanced materials to improve clean energy, national security, and human welfare. It builds upon the National Institute of Standards and Technology-funded Center of Excellence for Hierarchical Materials Design (CHiMAD) formed in 2013, and the Materials Data Facility (MDF), created in 2015 to enable the discovery, reuse, and publication of materials science data for scientists and researchers.
“What we are facing in the materials community in the coming years is the challenge of sharing, searching, and curating large materials data,” said Peter Voorhees, Frank C. Engelhart Professor of Materials Science and Engineering at Northwestern University and co-director of ChiMAD. “With data coming from our partners, IMaD will become an important resource for materials data that will leverage the strength of the materials engineering community in the Midwest.”
Initially, IMaD will connect data from its founding partners with the MDF, establishing a deep and comprehensive resource for materials science data. New tools created by Globus, a project of the Computation Institute at UChicago and Argonne, will make it easier for researchers to automatically upload new data from ongoing experiments, archive existing data to MDF, and find the data they need from other sites.
Several existing databases built and maintained by IMaD partners will link up with MDF, creating a “one-stop shop” for finding materials science data. CHiMAD will contribute databases on the properties and structures of polymer nanocomposites (NanoMine) and polymer blends (Polymer Property Predictor and Database). The University of Michigan will contribute its PRISMS Center Material Commons, with data on microstructural evolution and the mechanical behavior of structural metals. Laboratories at the University of Illinois and University of Wisconsin will provide datasets on alloy corrosion, solute diffusion, and other important material properties.
In addition to eliminating data silos and creating a multi-institutional resource, IMaD and MDF will help materials science researchers struggling with “big data” problems created by new techniques and technologies, such as resonant soft X-ray scattering and 4D X-ray tomography, which can produce terabytes of results.
Building bridges between detached data resources is only the beginning. IMaD participants will also develop new computational tools that intelligently search through data, as well as use it in predicting and simulating the properties of new, untested materials. Machine learning tools will take experimental data from known materials and predict new compositions with desirable properties, such as resistance to high temperatures or corrosive environments.
“We’re putting together a unique set of data capabilities that will allow meta analyses and machine learning studies that were really not possible before,” said Ben Blaiszik, research scientist at the Computation Institute. “Scientists will be able to bridge disparate datasets and get better results than they could from any one of the datasets by themselves.”
The IMaD consortium will also include several outreach efforts, including webinars, tutorials, and work with industry and technology partners interested in applying materials science data into new commercial products. In particular, IMaD researchers will engage companies from the aeronautics and automotive sectors on increasing usage of materials data sets, automating data workflows, and training their workforces on computational techniques. Another partnership with Citrine Technologies will link IMaD databases with their text-mining methods for generating structured data from previously published research.
“We think these powerful combined efforts will bolster the Midwest’s leadership in materials science and engineering,” Foster said. “IMaD can also serve as a model system for other materials science communities and data-heavy fields such as genomics and the digital humanities.”
Image: Model of graphene structure. The ideal crystalline structure of graphene is a hexagonal grid. Image courtesy of AlexanderAlUS. via Flickr.