S. Cenk Sahinalp, Ph.D.

Senior Investigator

Cancer Data Science Laboratory


Building 10, Room 2-6300 Bethesda, MD 20892



Research Topics

A long-time focus of my lab has been discovery and interpretation of large-scale (especially structural) genomic and transcriptomic alterations in tumor samples. Our algorithmic methods for genomic structural variation discovery, including VariationHunter, CommonLAW, DeStruct and NovelSeq, were the first with the ability to handle novel insertions, deletions, inversions and duplications in repetitive regions of the human genome. More recently I have been interested in applying our algorithmic techniques to exact genotyping of highly repetitive, structurally variant genes, e.g., those involved in drug metabolism – for which my group has developed Cypiripi and Aldy methods, and immunoglobulin heavy chain region, for which my group has developed Immunotyper. My group has also contributed to the identification and quantification of transcriptomic aberrations, in particular gene fusions, as well as genic inversions, duplications and deletions in cancer samples. Leading computational methods we have developed include DeFuse, NFuse, Comrad MiStrVar and SVICT (which handles circulating cell-free tumor DNA data). Our most recent focus area is tumor heterogeneity and progression modeling, especially by the use of single-cell sequencing or multi-locus/time series sequencing (for which we have developed CITUP, CTP-Single, Remix-T, BSCITE, PhISCS, CONETT, PhISCS-BnB); my lab has developed the first deep learning method for inferring the progression tree of a tumor. We also work on network-aided, integrative analysis of genomic and transcriptomic sequence data from tumor samples (Hit’nDrive and cdCAP). We have several additional interests within "algorithmic biology" including (i) mapping and variant calling (of/via especially reads from repetitive regions of the genome – or involving reads with high error rates – examples include mrFAST, mrsFAST, drFAST and lordFAST, or reads extracted from cell free tumor DNA - e.g. SINVICT),  (ii) genomic data compression (SCALCE, DeeZ and AssemblTrie), (iii) secure/privacy preserving computing (PrivStrat, SkSES and SMac) and (iv) metagenomic binning (CAMMiQ).


S. Cenk Sahinalp received his B.Sc. in Electrical Engineering at Bilkent University, Ankara, Turkey and his Ph.D. in Computer Science from the University of Maryland, College Park. His Ph.D. thesis introduced the first work optimal parallel algorithm for suffix tree construction and the first linear time algorithm for pattern matching. After a brief postdoctoral fellowship at Bell Labs, Murray Hill, he has worked as a Computer Science professor, most recently at Indiana University, Bloomington.

Sahinalp’s research has focused on combinatorial algorithms and data structures, primarily for strings/sequences, and their applications to biomolecular sequence analysis, especially in the context of cancer. In the past decade, his lab has developed several algorithmic methods for efficient and effective use of high-throughput sequencing data for better characterization of the structure, evolution and heterogeneity of cancer genomes. He has (co)trained more than two dozen Ph.D. students and postdocs -  many of them now hold independent academic and research positions in the U.S., Canada and Europe. He is also actively engaged in the computational biology community, having organized RECOMB 2011 in Vancouver, BC, chairing the program committee of RECOMB 2017 in Hong Kong, and founding the RECOMB-Seq meeting series.

This page was last updated on September 9th, 2021