News You Can Use

Crowdsourcing for Data Analysis

Web Platform Makes Public Gene-Expression Data More Accessible

people in a conference room sitting at tables working on their laptops and talking to each other

NIAID

Jamboree participants were eager to gain hands-on experience exploring publicly available gene-expression data sets and delving into the capabilities of OMiCC, a crowdsourcing web platform developed by NIAID.

The 29 immunologists who gathered in the Rathskeller Room, nestled in the basement of NIH’s Cloisters (Building 60), were brandishing only laptops—not beer steins—as they took part in the Omics Compendia Commons (OMiCC) Jamboree in April. They were eager to gain hands-on experience exploring publicly available gene-expression data sets and delving into the crowdsourcing web platform that was developed by scientists at the National Institute of Allergy and Infectious Diseases (NIAID).

John Tsang, chief of the Systems Genomics and Bioinformatics Unit in NIAID’s Laboratory of Systems Biology, began developing OMiCC three years ago after he noticed one of his postdoctoral fellows struggling with her project because she didn’t have the computational expertise to make good use of public data. Data retrieval, processing, and analysis typically require computer programming skills that many experimental biologists lack, Tsang noted. As a result, the wealth of public data remains largely untapped, accessible only to researchers who have the appropriate computational skills and experience. The OMiCC web interface, however, enables biologists without that experience to explore data and perform simple analyses and “helps to democratize the mining of public data sets,” said Tsang.

Tsang and his team designed OMiCC as a community-based platform to capitalize on the biological expertise of its users. Public database entries typically contain raw study data that need to be structured for analysis. In OMiCC, researchers can create groups of data, use a standardized vocabulary to annotate them, and assign parameters such as sample-type and disease. OMiCC saves these user-created groups and associated metadata, making them available to other users for reuse.

“It’s kind of like Wikipedia, but instead of user-generated articles, you end up with annotations and groups of data that can be reused to conduct new analyses, generate new hypotheses, and address novel scientific questions,” said Tsang. “We hope to kick off a positive feedback loop: As more people group and annotate data, the OMiCC platform will become even more useful, and more people will join the OMiCC community.”

To introduce OMiCC to the NIH community and help test its capabilities, Tsang and his team, including NIAID clinical fellow Rachel Sparks, organized the OMiCC Jamboree, the first such event they’ve led. Their goal was for the jamboree participants—25 from NIH, two from FDA, and two from nearby universities—to meta-analyze and compare autoimmune diseases in humans and mouse models.

At the beginning of the day, Tsang’s team divided the participants into groups of three based on their individual expertise and assigned each a particular area to explore such as mouse studies of multiple sclerosis or human studies of lupus. The participants had to identify relevant data and form comparison-group pairs (CGPs). Each CGP comprises two collections of gene-expression profiles from a single study. For example, researchers could create a CGP to compare blood samples from people with lupus with those from healthy control subjects. Throughout the day, members of Tsang’s team circulated through the room to answer questions and offer advice.

Within OMiCC, users can perform statistical analyses on CGPs from different studies to search for biological relationships. Taking such a meta-analysis approach to pool information from multiple studies has the potential for uncovering more robust biological signals, Tsang explained. OMiCC users also can generate basic visualizations and export the underlying data for further analysis with other programs. Currently, Tsang and his team are compiling preliminary results stemming from the work done during the April jamboree.

Tsang hopes that using OMiCC to analyze data from multiple studies will help biologists obtain fresh insights and inform the design of new experiments. But, he cautioned, OMiCC is not intended to replace collaborations with statistical and bioinformatics experts to perform advanced data analysis. His team hopes to host more jamborees in the future to further assess whether convening teams of volunteers to use public datasets is an effective way to generate and test research ideas.

The OMiCC platform is applicable to other biomedical researchers, too. Researchers can access OMiCC at http://omicc.niaid.nih.gov. The website also provides videos and a step-by-step tutorial to help users navigate the platform.


A description of OMiCC was published by Tsang, Sparks, and others in Nature Biotechnology in June 2016 (Nature Biotechnology (2016) doi:10.1038/nbt.3603)