Developing a Novel EHR-Based Surveillance Tool
MADMC is leveraging artificial intelligence to develop a surveillance tool that will help identify exposures to or suspected cases of emerging diseases. The tool will use Natural Language Processing (NLP) to extract meaningful insights from unstructured electronic health record (EHR) clinic notes. NLP is a branch of artificial intelligence focused on enabling computers to understand, interpret, and generate human language. This EHR-based surveillance tool will be especially useful for capturing nuanced or emerging diseases, where symptoms or exposures may be buried within provider notes rather than documented in structured fields.
Our Work
In close collaboration with the Minnesota Electronic Health Records Consortium (MNEHRC), our team is building the infrastructure to operationalize NLP on EHR notes. We are taking a multi-stage approach, starting with structured data extraction and culminating in standardized, reproducible NLP outputs across ten health systems in Minnesota.
Using disease-specific cases of long COVID, our team is fine-tuning the NLP models with high-quality training data. This will allow for more nuanced interpretations of clinical language through identification of patterns beyond simple keyword matching. With proper training, the models will support automated classification of note content, improve the accuracy of symptom or condition identification, and reduce the burden of manual review across large volumes of notes. We anticipate that future use of the tool will be to identify potential disease exposures among patients, such as agricultural exposures to highly pathogenic avian influenza (HPAI).
