From ToxPipe to FAIRkit

NIH-Built AI Chatbots Are Helping Scientists Sift Through the Data

BY PAIGE JARREAU, NIA; and THE NIH CATALYST STAFF

illustration of a chatbot — NIH researchers are creating an array of AI-powered tools to address unique intramural challenges.

Artificial intelligence (AI) tools are taking root across NIH, reshaping how researchers access information, analyze data, and advance biomedical discovery. From generative chatbots that streamline scientific queries to machine learning models that help harmonize massive datasets, AI is proving to be a powerful partner in tackling complex hypotheses in research topics spanning from toxicology to dementia and beyond. There are many, so let’s chat about ’em!

Chatbot creation for NIH biomedical research

Speakers at a June 11 NIH Library event that featured members from the NIH Generative AI Community of Practice showcased a range of AI-driven chatbot initiatives under development across the agency. Speakers and topics at a roundtable discussion, archived on the NIH Library YouTube channel, included:

“Generative AI Chatbots in the NIH Landscape: Foundations, Opportunities, and Considerations” by Alicia Lillich, NIH Library
“Chatbot for the Intramural Research Program, or ChIRP,” by Steevenson Nelson, OD
“ToxPipe: Chatbots and Retrieval-Augmented Generation on Toxicological Data Streams” by Trey Saddler, NIEHS
“CARDbiomedbench: Biomedical Benchmark of Chatbots, CARD.AI Arena, CARD.AI, FAIRkit” by Faraz Faghri, NIA
“AI Chatbots: Opportunities and Considerations at NLM” by Dianne Babski, NLM
“Using AI to Create a Travel Chatbot” by Fiona Vaughans, NCI

Lillich, an emerging technology specialist at the NIH Library, presented opportunities and ethical considerations in deploying generative AI chatbots within NIH’s information ecosystem. These tools, which use large language models (LLMs) to assist researchers, have the potential to improve literature discovery, scientific synthesis, and internal knowledge management, she noted.

Nelson shared ChIRP updates and reminded attendees that it was designed to respond to NIH intramural queries within a more secure environment than public LLMs offer. Read more about ChIRP in this recent NIH Catalyst article. ChIRP is being reimagined as a tool for all of NIH, encompassing both research and administrative tasks. The new and improved tool will be rolled out to the entire NIH community later this summer.

Saddler, a data scientist in the NIEHS Division of Translational Toxicology, highlighted ToxPipe, a chatbot-enabled platform that lets users explore toxicology databases through an intuitive interface powered by LibreChat. Saddler also demonstrated ChemBioTox, which uses autonomous AI agents to answer toxicology questions such as “What are the exposure levels of bisphenols?.” Responses are generated through multistep reasoning. The functionality can be evaluated via open-source tools that allow scientists to rate accuracy and refine results.

Faghri from NIA’s Center for Alzheimer’s and Related Dementias (CARD), presented several AI-driven platforms, including CARDBiomedBench, CARD.AI Arena, and FAIRkit. These tools, which are described in more detail below, “are using AI to better describe diseases, predict disease progression, and identify new drug targets,” said Faghri, a computer scientist in the advanced analytics expert group at CARD. “AI is helping us solve problems more quickly than ever, but ensuring safety and accuracy continues to be a critical concern.”

The results from CARDbiomedBench highlight that while AI models are rapidly advancing in general language understanding, performance in biomedical reasoning remains uneven (PMID: 39868292). Tasks that require domain expertise, such as interpreting clinical data or evaluating mechanistic hypotheses, often expose limitations in current models. This variability serves as a reminder that, despite impressive progress, AI tools must be applied with caution in biomedical contexts, where errors can have real-world consequences, Faghri explained.

CREDIT: NIA, CARD

Faraz Faghri

Advancing data harmonization

CARD’s advanced analytics expert group is applying AI to one of biomedical research’s greatest challenges—data harmonization. Different research groups collect different types of patient data, including genetic profiles, imaging, and environmental exposures. These datasets are often incompatible by default, which complicates, if not impedes, large-scale analyses. Standardization of so many data points is impossible to achieve yet incredibly necessary as biomedical research speeds toward a future of open access data and the application of machine learning.

The FAIRkit system, which uses OpenAI’s GPT models and Anthropic's Claude to automate the creation of common data elements (CDEs), may help. Leveraging AI and machine learning methods on comparing 31 dementia-related datasets, the tooling achieved interoperability scores of up to 60% when combining data from the Alzheimer’s Disease Neuroimaging Initiative and the Global Parkinson’s Genetic Program (PMID: 39484274). By automating what is typically a labor-intensive and error-prone process via human-in-the-loop AI, they were able to merge datasets and perform cross-study comparisons an order of magnitude faster than conventional methods, which is a critical step toward identifying early biomarkers, validating therapeutic targets, and working across data silos effectively and efficiently.

Predicting missing data

Missing or incomplete data, especially in electronic health records (EHRs), is another persistent obstacle in biomedical research. Traditional data collection and analysis techniques often fall short in capturing the complexity of health care information consistently.

The CARD team developed a machine learning framework called MUSE (Multimodal Unsupervised Embedding), which helps to predict missing values in patient data. MUSE uses graph neural networks to analyze the relationships among patient data across multiple data types such as brain scans, cognitive scores, and biomarkers. Rather than addressing each data gap in isolation, MUSE models the entire patient ecosystem to generate more accurate predictions.

The model improved predictions of Alzheimer’s disease progression by more than 3% compared with standard approaches. “There’s value in retaining data even from patients with large missing segments,” Faghri said. “We’re trying to figure out the broader structure of the data system and see where people with missing data might fit in. Graph neural networks help us connect the dots.”

An AI web crawler for 508 compliance

Dianne Babski, director of the NLM User Services and Collection Division, presented on NLM’s efforts to pilot human-centered AI. Babski demonstrated the NLM Web Accessibility Assistant, which was created by Dan Wendling to aid in making webpage content more accessible for users. The assistant identifies accessibility issues to help ensure Section 508 compliance and improve user accessibility. The assistant recommends fixes and provides code to make those webpage changes. To date, it has flagged over 67 unique error types across 9,000 website pages.

Babski and Nick Weber, acting director of CIT’s Office of Scientific Computing Services, co-chair the NIH Generative AI Community of Practice group. Hundreds of NIHers attend the group’s monthly meetings. Will we see you there?

The NIH Office of Science Policy is currently seeking input on responsible development of generative AI tools using controlled-access human genomic data. NIH encourages staff and stakeholders to comment on best practices for mitigating data leakage while promoting innovation. Comments are due by July 16, 2025, and a roundtable discussion will follow on July 17. Submit your feedback and learn more here: NIH Comment Form.

Additional resources shared at the event

GitHub: Learn more about NIH GitHub by emailing GitHub@nih.gov.

CARD tools and benchmarks on GitHub:

https://github.com/NIEHS/ToxPipe
- https://github.com/NIH-CARD/CARDBiomedBench

GitHub Copilot clone: https://continue.dev

CARD.AI Arena: https://cardai-arena-809832168532.us-central1.run.app/
PubTator Central (NLM): extracts information from PubMed abstracts and articles to create annotations of biomedical concepts for use with AI
Prompt engineering tips and tricks: https://www.promptingguide.ai
Blog post by CARD’s advanced analytics expert group: Can GPT-4.5, Claude 3.7, and Gemini 2.0 Keep Up with Biomedical Research?

Watch the entire June 11 event to learn more about the tools discussed above and much more.

This page was last updated on Monday, July 21, 2025