The web of science seems to be immeasurably large, with researchers around the world churning out papers in hundreds of different fields. So when scholars try to describe and explain how scientists weave new threads into the fabric of knowledge, they typically stick to very small patches . But in a massive new analysis of nearly 20 million biomedical journal articles, Knowledge Lab researchers constructed the most complete picture yet of the network of biomedical science -- and in doing so, found that it was surprisingly compact.
Scholars have invoked the image of a network before when thinking about science. Bruno Latour, whose “actor network theory” is the best-known version of this metaphor, memorably described how scientists, instruments, entities, and institutions “knit, weave and knot together” in the making of scientific facts. But these ideas were proposed long before large-scale computational investigations were possible.
In their recently published paper in the journal Social Networks, authors Feng Shi, Jacob Foster, and CI Senior Fellow and Knowledge Lab director James Evans formalized and extended this network metaphor using a special kind of network called a hypergraph. In typical networks, entities are connected by lines, encoding a simple two-way relationship between the nodes. For example, Facebook connects you with the people you know, who are connected to the people they know, and so on. In a hypergraph, an additional kind of relationship is possible, one that knits together multiple objects -- like a party that brings many of your friends together.
In science, these higher-order relationships aren’t parties; they’re papers, which bring together many different components -- the disease studied, the chemicals or methods used, and (of course) the authors. Whenever these elements appear together in a journal article they are linked in the hypergraph with multiple “partners,” as seen in the bubbles in the image above.
When Shi, Foster, and Evans used hypergraphs to map an enormous sample from the MEDLINE database of biomedical literature -- 19,916,512 articles from 9,300,182 authors published between 1865 and 2010 -- they discovered that this network was surprisingly compact. Most things in this dense web of diseases, methods, chemicals, and authors could be connected within a couple of jumps. To put it another way, the typical entity in this network is “a friend of a friend.”
“That totally changed our view of thinking about the fabric of science,” said Shi, a postdoctoral researcher with Knowledge Lab. “It turns out that in the network, everything is close to everything. Most of the time scientists only walk two steps away to get to their next experiment or research question.”
Because the fabric of science is so densely woven, it might be easy to get lost -- to lose the thread, so to speak. So how do scientists manage it? After assembling the hypergraph of science, Shi, Foster, and Evans could use it to test theories about how science progresses, on both a large and small scale. They found that a relatively simple random walk model could accurately predict the path of science, as scientists “wander” the network and connect new concepts to create an experiment or publication. Interestingly, the team found that these connections were typically made across types; for example, a promising new relationship between two chemicals may be discovered via a particular laboratory method or through a collaborating author, rather than through a similar chemical.
Such insights could guide researchers as they design future patterns in the fabric of science. Significant holes in the hypergraph may represent promising experiments that haven’t been tried yet. Conversely, heavily connected areas may mark locations where additional discoveries are unlikely.
“The big question is how scientists search in this space,” Shi said. “That will have implications for how we can do science better, both for individual scientists and for science as a whole. It can suggest what direction a funding agency should encourage, or what direction of research is saturated.”
In the current study, all publications and paths are considered equally valuable. But in the next phase of the project, the researchers will use citation performance to look for the most successful combinations and paths, perhaps finding common patterns for the most effective and fruitful research projects. Determining the best “designs” for scientific inquiry could someday lead to software that suggests promising new experiments to scientists, just as today’s dating sites or online retailers suggest people or products.
“We’re trying to find the most influential or the most successful path, to identify fields that are really close but haven’t yet been bridged, and areas that could be promising but are under investigated,” Shi said. “Instead of saying that this guy or this girl is someone you might be interested in -- or recommending a movie or book -- we’re working towards recommending that this chemical and this disease and this method should be used together.”