
Research Topics
A long-time focus of my lab has been discovery and interpretation of large-scale genomic and transcriptomic alterations in tumor cells. Our algorithmic methods for genomic structural variation discovery, including VariationHunter, CommonLAW, DeStruct and NovelSeq, were the first with the ability to handle novel insertions, deletions, inversions and duplications in repetitive regions of the human genome. I am currently interested in applying our algorithmic techniques to exact genotyping of highly repetitive, structurally variant genes. These include genes involved in drug metabolism – for which my group has developed Cypiripi and Aldy methods, and the immune system genes for which my group has developed Immunotyper and Geny methods. My lab has also contributed to the identification and quantification of transcriptomic aberrations, in particular gene fusions, as well as genic inversions, duplications and deletions in cancer samples. Leading computational methods we have developed include DeFuse, NFuse, Comrad MiStrVar and SVICT (which handles circulating cell-free tumor DNA data).
Our current focus area is tumor heterogeneity and progression inference, especially by the use of single-cell sequencing and spatial/time series sequencing (for which we have developed CITUP, CTP-Single, Remix-T, BSCITE, PhISCS, CONETT, PhISCS-BnB, DeepT, HUNTRESS, Sgootr, Detopt and others). We have also worked on network-aided, integrative analysis of genomic and transcriptomic sequence data from tumor samples (Hit’nDrive and cdCAP). We have several additional interests within "algorithmic biology" including (i) read alignment and variant calling (e.g. for reads from repetitive regions of the genome – mrFAST, mrsFAST, drFAST and lordFAST, or reads extracted from cell free tumor DNA - SINVICT), (ii) genomic data compression (SCALCE, DeeZ and AssemblTrie), (iii) secure/privacy preserving genomic sequence analysis (PrivStrat, SkSES, SMac, TX-Phase) and (iv) metagenomic binning (CAMMiQ).
Biography
S. Cenk Sahinalp received his B.Sc. in Electrical Engineering from Bilkent University in Ankara, Turkey, and his Ph.D. in Computer Science from the University of Maryland, College Park. His doctoral research introduced the first work-optimal parallel algorithm for suffix tree construction and the first linear-time algorithm for approximate pattern matching. After completing a postdoctoral fellowship at Bell Labs, Murray Hill in 1997, he served as a computer science professor at several universities; he has been with the CDSL (Cancer Data Science Laboratory) since 2019.
Sahinalp’s research centers on principled algorithms and data structures for strings/sequences, trees, and graphs, with a particular emphasis on applications in molecular cancer biology. Over the past decade, his lab has developed multiple algorithmic methods for harnessing high-throughput sequencing data to characterize the structure, evolution, and heterogeneity of cancer genomes. He has (co)trained more than 30 Ph.D. students and postdocs who now hold leading positions in industry, research institutes, and academia across the U.S., Canada and Europe.
Sahinalp has been an active contributor to the computational biology community. He organized the RECOMB (Research in Computational Molecular Biology) conference and founded the RECOMB-Seq meeting series in 2011, and chaired the RECOMB program committee in 2017. He currently serves on the RECOMB steering committee and is a fellow of the ISCB (International Society for Computational Biology).
Related Scientific Focus Areas
This page was last updated on Wednesday, March 19, 2025