NLM Scrubber: Paving the road to “Big Data” by securing patient privacy



Patients’ health data has the potential to transform how clinicians provide care and scientists conduct research—but ensuring patient privacy has been a major barrier. It is therefore critical that clinical records be effectively stripped of personally identifiable information (PII) before being shared.


IRP researchers led by Mehmet Kayaalp, M.D., Ph.D., developed a clinical text de-identification software tool called NLM Scrubber, which protects patient privacy better than any other freely available de-identification program.


Dr. Kayaalp’s NLM Scrubber tool means that the greater NIH research community will soon be able to access most data stored in electronic medical records (EMRs) without breaching patient privacy, an important step forward to realizing the promise of “Big Data” in healthcare.


Kayaalp M, Browne AC, Callaghan FM, Dodd ZA, Divita G, Ozturk S, McDonald CJ. (2014). The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. J Am Med Inform Assoc. 21(3), 423-31.

Kayaalp M, Browne AC, Dodd ZA, Sagan P, McDonald CJ. (2014). De-identification of address, date, and alphanumeric identifiers in narrative clinical reports. AMIA Annu Symp Proc. eCollection 2014. 767-76 [Epub ahead of print].

Browne AC, Kayaalp M, Dodd ZA, Sagan P, McDonald CJ. (2014). The challenges of creating a gold standard for de-identification research. AMIA Annu Symp Proc. 353-8. eCollection 2014. [Epub ahead of print].

Huser V, Kayaalp M, Dodd ZA, Cimino JJ. (2014). Piloting a deceased subject integrated data repository and protecting privacy of relatives. AMIA Annu Symp Proc. 719-28. eCollection 2014. [Epub ahead of print].

View All Health Topics

This page was last updated on Friday, June 16, 2023