News You Can Use
CDEs: A Critical Charge to Standardize Data
BY ANNELIESE NORRIS, NCI, and ARTHI RAMKUMAR, NIAID
“To better understand relationships between the genes we inherit and the environmental and societal factors that surround us, and to deliver more evidence-driven health care, research must be integrated into clinical care and community settings, reaching patients from all walks of life.”
So wrote Monica Bertagnolli in June in her first Science editorial (PMID: 38843323) as NIH director, announcing NIH’s $30 million investment into a pilot research network that integrates clinical research with community-based primary care.
Key to achieving this goal will be a keen focus on common data elements (CDEs). These are standardized, precisely defined questions, paired with a set of specific allowable responses, used in clinical and other research studies to facilitate data sharing, comparison, and interoperability across research domains.
For example, to capture age, instead of asking how old a participant is, one might ask a bundle of questions that includes both age in years and date of birth. Consistently collecting data in this way would allow the data to be reused by other researchers.
Bertagnolli has long championed efforts to adopt data standards that allow information from the clinical care environment to clinical trials to be combined, shared, and reused across disciplines. Collecting data in a common language can enhance existing artificial intelligence tools, expose health disparities, and augment the ability of researchers from around the globe to collaborate and build on each other’s work.
Harmonizing data
“Standardization is the key for unambiguity in data analysis, making it possible for investigators to share and compare datasets,” said Belinda Seto, deputy director of the Office of Data Science Strategy (ODSS), who points out that CDEs are community-driven and consensus-building, meaning that everyone needs to follow the same definition (e.g., of a particular clinical observation), and in the clinical setting, the same collection process..
Seto and colleagues have been tasked by the NIH Scientific Data Council, with guidance from Congress, to collect feedback from the NIH community to develop a set of minimal core CDEs that will be required for all NIH-funded clinical research, and to define new CDEs in disease areas such as immune-driven conditions. The core CDEs currently under discussion and that have received public input include demographic and identity elements such as age (as described above), gender identity, sex assigned at birth, and race/ethnicity.
“When you don’t have community consensus, you are not going to get broad adoption and utility,” Seto said, citing diabetes as an example in which the lack of consensus has resulted in over 100 data elements describing the disease.
Another ODSS priority is to determine a set of core CDEs that reflect social determinates of health (SDoH), which are nonmedical conditions that affect a person’s health. Asking about SDoH could capture whether a person has experienced traumatic events, food insecurity, poverty, or lack of access to affordable quality health care—just to name a few—and offer critical insight into the social, economic, and environmental factors influencing health outcomes.
“SDoH screening assessments should be included in all human research, even if the interest to delve deeper into the social drivers is not part of the study,” said Deborah Duran, a senior advisor to NIMHD Director Eliseo Pérez-Stable. Duran has worked tirelessly across the federal fold to help develop the first set of NIH-endorsed SDoH, which include questions to assess employment status, postal Zip code, educational attainment, and where a person usually seeks health care.
Duran’s efforts were integral in producing Science Collaborative for Health Disparities and Artificial Intelligence Bias Reduction (ScHARe), a cloud-based platform for population science including SDoH and datasets designed to accelerate research in health disparities, health and health care delivery outcomes, and strategies to mitigate bias in artificial intelligence. (Read more about ScHARe in our past coverage.)
CDEs in clinical research
While the ODSS team is building NIH-wide consensus, looking at how individual institutes, centers, and offices across NIH are applying CDEs offers a glimpse into what a data-unified future across the agency might look like.
NCI’s Chuen-Yen Lau, associate research physician at the HIV Dynamics and Replication Program (HIV DRP), uses CDEs in a current protocol (NCT: 05419024) and collects clinical data from participants with HIV, including CDEs on age, gender, and ethnicity. According to Frank Maldarelli, an HIV DRP principal investigator, “CDEs will be critical for future analyses to [compare] data across cohorts and as new language processing models are developed.”
Another instructive example comes from the NIMH Division of AIDS Research, where applicants for HIV-related research funding (NOT-MH-23-105) are expected to collect a set of CDEs that include age, sex at birth, gender identity, HIV status, and assessments for anxiety or depression. Grant applicants are also encouraged to include measures that reflect SDoH to assess how those factors influence HIV-related outcomes and contribute to health disparities. They are referred to the PhenX Toolkit to select CDEs most appropriate for their project.
Across NIH, CDEs shared among different initiatives illustrate the utility of harmonized data. CDEs developed by NICHD reflecting maternal health are also used at the Researching COVID to Enhance Recovery (RECOVER) initiative, which funds research focused on the long-term effects of COVID-19. In addition to using COVID-specific CDEs, RECOVER also integrates SDoH that capture data such as housing and employment status that were created by RADx Underserved Populations, an initiative that is helping to develop and implement COVID-19 testing in underserved communities. Additional RECOVER CDEs that capture demographic information, such as gender identity, were sourced from NIH’s All of Us Research Program.
Furthermore, several ICs such as NINDS, NIDA, NIA, and NIMH have catalogs of CDEs specific to their research portfolios.
Ready to use CDEs in your research?
- Learn more about NIH-endorsed CDEs and how to use them at the NIH CDE repository: https://cde.nlm.nih.gov/home.
- To find SDoH CDEs and data-collection protocols, search the PhenX SDoH Collection: https://www.nimhd.nih.gov/resources/phenx/.
- Learn about NCATS’ Biomedical Data Translator tool, a multiyear, iterative effort to develop a platform that integrates multiple types of existing data sources to aid researchers in discovering novel connections relevant to understanding pathophysiology: https://ncats.nih.gov/research/research-activities/translator.
- For inquiries about the NIH-wide minimal core CDEs, contact Belinda Seto at setob@mail.nih.gov.
Anneliese Norris, a scientist at NCI, is working on HIV dynamics and replication. In her spare time, she enjoys reading and building with LEGO blocks.
Arthi Ramkumar is a postbaccalaureate fellow in the Bacterial Pathogenesis and Antimicrobial Resistance Unit at NIAID, where she uses genetic engineering to study antibiotic resistance. In her spare time, she enjoys running and painting.
Michael Tabasko also contributed to this article.
This page was last updated on Tuesday, December 3, 2024