Adam M. Phillippy, Ph.D.
Computational and Statistical Genomics Branch
Building 49, Room 4A22
49 Convent Drive
Bethesda, MD 20892
Members of the Genome Informatics Section have been at the forefront of bioinformatics for over a decade and have made important contributions to the problems of genome assembly, read mapping, whole-genome alignment, variant detection, and metagenomics. These bioinformatics advances have been inextricably linked to advances in DNA sequencing technology, and in a field that moves as quickly as genomics, simply keeping pace with changing technology is a challenge. For example, computational methods that were once successful for capillary sequencing do not work with the massive number of short reads produced by amplified cyclic technologies. This sparked a flurry of short-read mapping and assembly methods. More recently, single-molecule sequencing has emerged, producing much longer but less accurate reads. Again, this fundamental shift in data type requires new methods for even the most routine bioinformatics tasks. The Genome Informatics Section aims to enable the widespread use of such emerging technologies, and apply these new methods to the most challenging problems in genomics.
Despite the higher error rates of single-molecule technologies, the incredibly long reads they produce have many exciting applications in genome assembly, structural variant detection, and metagenomics. Members of the Section were among the first to develop an assembly method capable of reconstructing complete microbial genomes directly from single-molecule sequencing. An improved version of this method was later used to generate the first de novo single-molecule assembly of a eukaryote, Drosophila melanogaster. This assembly vastly improved upon previous assemblies and included fully assembled chromosome arms, novel telomeric transition sequences, a complete mitochondrial genome, and a significant fraction of the heterochromatic Y chromosome—revealing new biology in a genome that had been curated and studied for over a decade. These same techniques are now being applied to human and other important species, enabling new studies of chromosomal structure and variation.
The ultimate goal of genome assembly is to generate a gap-free reconstruction of the genome from end to end. Although long thought impossible due to limitations in cloning heterochromatin, single-molecule sequencing may soon enable the complete reconstruction of human genomes. Prior work from the Section has shown this is already possible for microbes and smaller eukaryotes, and it seems only a matter of time before technology improvements enable the gapless assembly of larger genomes such as human. The interim goal is a single, finished human genome including both euchromatin and heterochromatin. A finished reference would not only reveal the last remaining regions of the genome, but also benefit downstream analyses by providing an unbiased reference for comparison and mapping. In a first attempt, the Section assembled the genome of a human hydatidiform mole using approximately 50X coverage of single-molecule sequencing. The resulting assembly correctly resolved 75% of all known segmental duplications and closed multiple gaps in the human reference genome. This was encouraging for the first human genome assembled from single-molecule data, and Section researchers continue to improve upon this result with the incorporation of new data types and the development of algorithms able to resolve the small variations found between duplications and diverged alleles. Such algorithms will eventually enable the full reconstruction of diploid genomes and metagenomic populations.
Lastly, the recent sequencing technology advances also create an enormous opportunity to combat infectious disease. Once a privilege of genome centers, labs and hospitals can now sequence microbial genomes for a few hundred dollars each. Properly structured, a distributed sequencing model could form the basis of a digital immune system that continually monitors the microbial landscape to detect outbreaks before they spread. Such a scheme, deployed at hospitals and other important outposts, could reveal the evolution and spread of infectious disease and antibiotic resistance in the population. As sequencing technologies become smaller and more affordable, clinical and environmental pathogen sequencing will become routine, generating huge stores of data and functioning as a de facto sensor network. Actively monitoring such data will better inform outbreak response, antibiotic treatment, and vaccine development. However, realizing these benefits requires methods for storing and analyzing millions of genomes. The Section aims to develop computational methods that enable this scale of data collection and analysis.
Dr. Phillippy is head of the Genome Informatics Section and a tenure-track investigator in the Computational and Statistical Genomics Branch at the National Human Genome Research Institute. In 2000, Dr. Phillippy began working as a bioinformatics research assistant for Dr. Arthur Delcher at Loyola University Maryland, and received his B.S. in computer science in 2002. Following this, he worked for four years at The Institute for Genomic Research (TIGR) under the supervision of Dr. Mihai Pop, where he developed several foundational tools for genome assembly and alignment. Dr. Phillippy was also an integral contributor to TIGR's investigation of the 2001 anthrax attacks, having developed methods that were key to tracing the source of the outbreak. In 2006, he began his graduate work under the advising of Dr. Steven Salzberg at the University of Maryland, researching new methods for pathogen detection using PCR, microarrays and DNA sequencing.
Dr. Phillippy received his Ph.D. in computer science in 2010, and immediately joined the National Biodefense Analysis and Countermeasures Center (NBACC) as a principal investigator, where he established and led the National Bioforensic Analysis Center's first bioinformatics group. During this time, he pioneered the use of single-molecule sequencing for the reconstruction of complete genomes, and helped the NBACC become the first laboratory in the United States to achieve ISO 17025 accreditation for whole-genome sequencing.
In 2015, Dr. Phillippy joined the National Human Genome Research Institute and founded the Genome Informatics Section, where his current research group resides.
Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015;33(6):623-30.
Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, Mcvey SD, Radune D, Bergman NH, Phillippy AM. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol. 2013;14(9):R101.
Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z, Rasko DA, McCombie WR, Jarvis ED, Adam M Phillippy. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol. 2012;30(7):693-700.
Phillippy AM, Mason JA, Ayanbule K, Sommer DD, Taviani E, Huq A, Colwell RR, Knight IT, Salzberg SL. Comprehensive DNA signature discovery and validation. PLoS Comput Biol. 2007;3(5):e98.
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12.