Inside the Discovery Cloud: Deep Text Mining for Cancer & Disease
Text mining is often discussed in the context of humanities research or marketing, where an enormous pool of text can be computationally sifted for new insight or targeted advertising. But text mining is also gaining a foothold in biology and medicine, as researchers increasingly realize that the millions upon milions of journal articles published in these fields may hold previously undiscovered insights for understanding and treating disease. In fact, the massive corpus of scientific literature may be a gold mine for scientists of both the social and life variety, allowing historians to reconstruct the bumpy path of science, providing new perspective on the current landscape of research, and suggesting future directions that may be most fruitful and cost-efficient.
Evans provided an overview of the methods Knowledge Lab uses to extract knowledge from scientific papers, grant applications, insurance claims, and other sources. From these millions of documents, Evans' group constructed a global map of research priorities by country, finding basically zero correlation between disease burden and research attention, and created a Health Research Opportunity Index (or Health ROI) that identifies "overstudied" and "understudied" diseases. Another study extracted chemicals, methods, and other elements from papers to build systems that represent how research in a field generates new hypotheses to test -- and may recommend the experiments more likely to be successful in the future.
Some of these networks mirror the work of the Conte Center in their search for new cancer treatments in the deep well of scientific literature. Chattopadhyay talked about how networks of molecular interactions and gene expression derived from text mining gives scientists new models for finding potential new drug targets -- or combinations of targets -- to slow or disrupt cancer cells. Using a combination of network theory and molecular biology, and genomics, these models provide predictions about effective therapies, including drug cocktails, that are then tested in the laboratory, generating new data that in turn can help improve the model.
This event was the final installment of our 2014-2015 Inside the Discovery Cloud series on "Catalyzing Collaboration." If you missed any of the talks, or would like to revisit those that you attended, they are archived here. Stay tuned for the announcement of next year's series.