News You Can Use

The Genomic Ascertainment Cohort is Open for Business

The past decade has seen an explosion in the availability of genome-scale data—including chip genotypes, exome sequences, and even full-genome sequences—from hundreds of thousands of individuals. But even with all these data, it can be challenging for scientists to figure out the exact roles that particular genetic variants play in the development of diseases. Often, the problem is not the research itself, but the difficulty of finding enough people with the variant of interest.



To solve this conundrum, intramural researchers from the National Human Genome Research Institute (NHGRI) and the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) have joined forces to create a pilot program called The Genomic Ascertainment Cohort (TGAC). The program—guided by Leslie Biesecker (NHGRI), Richard Siegel (NIAMS), and John Niederhuber (CEO of Inova Translational Medicine Institute, ITMI; and former director of the National Cancer Institute from 2006 to 2010)—is aggregating genomic data from many existing cohorts into a single, searchable system. The goal is to enable NIH intramural investigators to study the phenotypic consequences of genetic variants. TGAC will include at least 10,000 individuals whose genomes or exomes have been sequenced and who have agreed to be re-contacted for secondary phenotyping studies.

“So many times you have a very specific question; you may have even found a very important variant in the genome,” said Siegel. But “you can’t study it because you can’t dial up the patient who has those specific variants.”

“The beauty of [TGAC] is that you have all these data that are attached to able and willing research participants,” said Biesecker. “It allows us to be much more efficient with research data.”

TGAC is modeled after ClinSeq, an NHGRI clinical study launched by Biesecker in 2007 that was designed to facilitate the use of genomic data for a wide variety of research programs. The ClinSeq cohort includes about 1,500 participants who have consented to being re-contacted for additional clinical trials at the NIH. The genome or exome of each individual in the ClinSeq study was sequenced, and the aggregate data were made available to NIH researchers. Several studies, such as one by Joshua Milner (National Institute of Allergy and Infectious Diseases) on hereditary alpha-tryptasemia, have used the ClinSeq resource to identify individuals with particular variants of interest.

However, the ClinSeq cohort is relatively small, which makes finding infrequent alleles challenging. To increase the number and variety of participants—and therefore the likelihood of finding sufficient numbers of important genetic variants—TGAC is partnering with the ITMI to incorporate participants and genomic data from the Inova Longitudinal Childhood Genomic Study cohort. The study, which was established by Niederhuber in 2012, includes more than 8,000 individuals in parent-child trios; all of the children were born at Inova Fairfax Hospital in northern Virginia.

“It’s really important to have a local cohort,” said Siegel. In addition to the large size of the ITMI cohort, location “is the real spark to working with Inova, because it can be difficult to persuade a person to come a long distance to participate as a healthy volunteer.”

To make the genomic data available, Siegel and Biesecker are entering the aggregate data from these cohorts into a software architecture that was originally developed by an international coalition of researchers for the Genome Aggregation Database. This system allows NIH researchers to use a web portal to explore whether specific variants are present in the TGAC participant cohort while preserving the privacy of study subjects.

Although there are many larger anonymized exome, genome, and genotyped datasets available, TGAC’s key advantage is its association with the NIH Clinical Center (CC). The CC, the largest clinical-research center in the world, can do much more in-depth phenotyping of patients than can be done at extramural facilities.

“If you couple the genomics with what you can do evaluating patients at the Clinical Center, that’s a potential combination nobody else has,” said Biesecker.

For example, “you can do a blood study anywhere,” said Siegel, “but where are you also going to do functional [magnetic resonance imaging studies] on 100 people? Deep phenotyping is something the Clinical Center does very well.”

TGAC’s lead staff clinician, Alexander Katz (NHGRI), will help bring additional existing cohorts into the TGAC and assist interested investigators with small phenotyping projects under its existing clinical protocol.

“Across NIH, there are [already] at least 10,000 sequenced individuals,” said Siegel. But many of those individuals will need to give consent to be re-contacted, he explained. Siegel and Biesecker also hope to incorporate many additional cohorts from around the NIH–including any new patients who come through the NIH Clinical Center—a task that will require significant additional genomic sequencing.

Another important partner is the Walter Reed National Military Medical Center (Bethesda, Maryland), “which has a really great sequencing center,” said Siegel. In addition, TGAC hopes to include data from the Environmental Polymorphisms Registry at the National Institute of Environmental Health Sciences (Research Triangle Park, North Carolina) as well as from organizations outside the NIH.

Tyra Wolfsberg, associate director of NHGRI’s Bioinformatics and Scientific Programming Core, is leading the team that is developing the TGAC web portal. Data from the ClinSeq cohort are already available to NIH investigators through this portal, and data from the ITMI cohort will be added next. Additional genomes will be added as they are processed. Although the pilot-phase TGAC database will be accessible only to intramural investigators, the leaders of this effort hope to be able to make the resource available to the wider scientific community in the future. For more information about TGAC, visit (NIH only) or contact Alexander Katz at