Models to estimate absolute risks of diseases or of carrying mutations
For my Ph.D. thesis, I improved a clinically-used statistical model (BRCAPRO) that uses family history of breast/ovarian cancer to predict who in a family carries a BRCA1/2 mutation to help decide about offering genetic testing to patients. To identify the issues of greatest clinical importance, I independently sought out and attended weekly clinical meetings about whether to offer BRCA1/2 testing to patients. I chose my three thesis topics (effects of misreported family history, accounting for medical interventions, accounting for non-breast/ovarian cancers) as the most clinically-important and feasible improvements based on my experience at the clinical meetings. My improvements have been integrated into BRCAPRO and are in clinical use. Most importantly, accounting for non-breast/ovarian cancers fixed the outstanding over-prediction bias in BRCAPRO.
I am leveraging the experience gained during my thesis to an ambitious and important project: constructing a clinical risk-prediction model for cervical precancer. Current clinical algorithms are not yet poised to take advantage the flood of information provided by HPV tests, vaccination, and next-generation biomarkers. A risk calculator should incorporate HPV vaccination, HPV test results, Pap smears, and other clinically-available biomarkers, to categorize women into risk-based management groups. If fully successful, this risk calculator will enable future guidelines for clinical management to be based on risk. I have articulated this vision and am laying the groundwork for this project. I lead a multidisciplinary NIH-wide team to advise me on the multiple epidemiologic, clinical, statistical, and risk communication challenges. I plan to produce an initial model to help shape clinical practice and later provide successive refinements.
Efficient sampling and plans to improve epidemiologic study design
To save resources in cohort studies while retaining statistical efficiency, the exposures are measured on most disease cases but only a well-chosen sample of the controls. However, since exposures are not measured on all cohort members, standard methods cannot conduct survival analyses to estimate Kaplan-Meier survival curves or fit Cox models to estimate hazard ratios. Analyzing such studies solely as case-control studies ignores the information in the controls missing exposure measurements but still having information on outcomes and other confounder variables. Such studies are better analyzed as two-phase designs which extract information from all cohort members; subsets of this design include the case-cohort and nested case-control designs, but we allow for general stratified sampling as well. Our methods can estimate hazard ratios, survival curves and attributable risks for general studies nested within cohorts. Our methods can realize impressive information gains by using the entire cohort, and permit efficient sampling designs for controls to have exposure measurements. Finally, our methods can extract information from surrogates for exposure observed on the full cohort. My R package NestedCohort provides software for these methods.
In three collaborations, we proposed comparing a new diagnostic test to a pre-existing test already conducted on all specimens, by conducting the new test on only a judicious subsample of specimens. I introduced methods to estimate agreement statistics and conduct symmetry tests when one test is conducted on only a subsample. These methods achieve adequate statistical efficiency while greatly reducing study costs and specimen consumption. I am currently working on efficient study designs that use my methods for comparing diagnostic tests. Methods to compare diagnostic tests can also be applied to compare risk prediction models by grouping risks into categories. I plan to apply my methods to quantify the improvement in lung cancer risk prediction by measuring circulating C-reactive protein levels.
Unmeasured host risk factors and their role in etiology and prevention
Unmeasured host risk factors can have observable impact on disease risk. For example, since the vast majority of genetic mutations predisposing breast cancer risk remain unknown, women who test negative for their families mutation in BRCA1/2 may remain at above-average cancer risk if they have a family history of cancer beyond that accounted for by their family's mutation and other known risk factors. We addressed this controversy by proposing a novel metric to quantify residual familial risk due to unknown host risk factors and showed that the additional risk could justify continued, or even increased, breast cancer screening.
It has long been acknowledged that women respond differently to HPV infection and vaccination. These differences between women ("frailty") are likely due to undiscovered risk factors, including unknown host immune mechanisms. I am working on quantifying the role that unknown host factors play in the epidemiology of multiple HPV infections, persistence of HPV infection, and in measures of response to HPV vaccination, and in estimating the population-level impact such unknown factors could play in response to infection and vaccination.