Proteins and Processors
A Chat with NIH’s Chief Data Scientist, Philip Bourne
At first blush, Philip Bourne sounds a lot like the name of a secret agent. The Australian-born chemist has spent much of his career working with supercomputers, occasionally taking a break to fly small airplanes or take transcontinental journeys on his motorcycle. But unlike Jason Bourne of the Bourne film franchise, Philip Bourne uses his special skills to help researchers ply massive piles of data to produce new scientific breakthroughs.
Hired as the NIH’s associate director for data science in 2014, Bourne administers an approximately $100 million funding enterprise called the Big Data to Knowledge program, or “BD2K” for short. The ultimate goals of the initiative, Bourne said, are to establish the NIH as a leader in data science, train scientists to take advantage of so-called “Big Data” in their research, and create policies and processes that will guide the use of Big Data. NIH launched the BD2K venture in 2012 and now funds 13 research centers across the United States. BD2K also provides many benefits to NIH intramural scientists including programs that teach individuals to train others in the use of basic data-science tools; hackathons in which researchers can examine unrelated data sets to generate novel findings; and a virtual commons where both intramural and extramural investigators are starting to access each other’s data and store their own.
While some define Big Data as simply the accumulation of large quantities of information, Bourne takes a different view. “For me, the important thing is aggregation of data that [come] from different sources,” Bourne said. “But when you put them together, it opens up new possibilities and [enables] new discoveries.” As an example, Bourne pointed to a recent project that combined data from a smart-phone application with information from the crowd-sourced review website Yelp to identify correlations between the concentration and type of restaurants in a city and the physical-activity levels and weights of the city’s residents.
Bourne has taken a somewhat nontraditional path to becoming NIH’s resident guru of data science. His first love was actually chemistry, which he decided to pursue because his high-school chemistry teacher told him he would never succeed in the field.
“I think it was his way of inspiring me,” Bourne chuckled. “I think he knew all along that I would rise to the challenge.”
But the momentous advancements in computer technology that occurred during his undergraduate years at Flinders University (Bedford Park, South Australia) also sparked an interest in that area. These developments enabled physical chemists such as Bourne to create computer models of how the atoms that make up important biological compounds are positioned in space. “It’s that three-dimensional arrangement that confers the function of a molecule and how it interacts with other molecules,” he explained.
As a postdoctoral fellow at the University of Sheffield (Sheffield, England), Bourne used these models to determine the structure of the iron-storage protein ferritin found in human red blood cells, a discovery that shed light on how the human body processes iron (Nature 288:298–300, 1980). Bourne has also used his expertise to model the structure and behavior of drug molecules, information that pharmaceutical companies can use to make therapies more effective and reduce side effects. “One of the things I’ve always pushed for is the idea of translation,” Bourne said. “I’m not particularly interested in publishing a paper on a new interaction or a new tool. I’m much more interested in seeing this [research] translated into something that has an impact on health care.”
The importance of Bourne’s computer skills to his research interests made the transition to data science a natural one. When he began working in the Department of Biochemistry and Molecular Biophysics at Columbia University (New York) in 1981, he wanted to use computer simulations to examine how neurotoxins bind to certain receptors in the nervous system. Columbia’s computers were so bad, however, that Bourne complained to the dean about them. The dean offered to raise money for a new computer center if Bourne would help build it and run it. Bourne agreed.
But after managing the facility for four years, the advent of the Human Genome Project pulled Bourne back into research. Sequencing a genome requires the use of computer algorithms to identify and link together overlapping regions of DNA fragments to form a continuous sequence. Bourne assisted in the assembly of human chromosome 13. “Doing that, I realized that this was the future of biomedicine,” he said. “I wanted to get back to research rather than run a computer center for other people.”
In 1995, Bourne moved to the University of California at San Diego (La Jolla, California), where he taught classes in the Department of Pharmacology and conducted research in the San Diego Supercomputer Center. He eventually became the director of the center’s Integrated Biosciences Program and associate director of the university’s Protein Data Bank, an online platform through which scientists around the world can upload and use information about the structures of biological molecules.
At the same time, he began studying evolution through the lens of protein structure. By examining the spatial organizations of proteins found in different organisms, Bourne and his collaborators were able to produce what Bourne calls “the tree of life”–a chart depicting the evolutionary closeness of 174 species (Proc Natl Acad Sci U S A 102:373–378, 2004). According to Bourne, this work would have been impossible without resources, such as the Protein Data Bank, that provide access to large collections of other researchers’ data.
“And that’s why I’m so intrigued with what’s going on now—more and more of biomedicine is involving [the] re-use of other people’s data,” he explained. “It’s becoming an increasingly important part of what we do every day.”
So when NIH Director Francis Collins asked Bourne to run the NIH’s brand-new data-science initiative, Bourne jumped at the opportunity.
When Bourne is not using computers to scrutinize the shapes of proteins or launch new research programs, he enjoys using them to communicate about his other great love: motorcycles. An avid enthusiast since the age of 16, Bourne maintains a travel blog that describes his many long-distance trips, including a nearly 5,000-km excursion around western Australia in 2009, a 5,000-km jaunt through Eastern Europe in 2010, and the trip he undertook from San Diego to Virginia after he accepted his position at NIH. In the near future, he hopes to ride his bike out to Yellowstone National Park. NIH’s colossal datasets will be awaiting his return.
This page was last updated on Wednesday, April 13, 2022