1 option
Extracting Insights From Electronic Health Records Using Optimized Large Language Models Kevin Xie
- Format:
- Book
- Thesis/Dissertation
- Author/Creator:
- Xie, Kevin, author.
- Language:
- English
- Subjects (All):
- Bioengineering.
- Bioinformatics.
- Computer science.
- 0202.
- 0715.
- 0800.
- 0984.
- 0769.
- Local Subjects:
- Bioengineering.
- Bioinformatics.
- Computer science.
- 0202.
- 0715.
- 0800.
- 0984.
- 0769.
- Physical Description:
- 1 electronic resource (185 pages)
- Contained In:
- Dissertations Abstracts International 86-07B
- Place of Publication:
- Ann Arbor : ProQuest Dissertations and Theses, 2024
- Language Note:
- English
- Summary:
- The Electronic Health Record (EHR) contains extensive patient clinical information, including demographic and socioeconomic information; laboratory, imaging and diagnostic results; treatment plans; and comprehensive records of patient medical histories. This wealth of information makes the EHR especially suitable for retrospective studies by allowing clinicians and researchers to draw new conclusions, potentially at reduced cost, by looking backwards through time across information gathered during patient-healthcare interactions. However, the most valuable information is captured within unstructured free-text clinical notes, precluding simple data mining methods and instead favoring time-consuming and expensive manual chart review. To address this gap, I developed a Natural Language Processing (NLP) approach that uses modern techniques to drive large-scale retrospective clinical informatics research through the EHR. I demonstrate these techniques on Epilepsy, a neurological disorder with complex phenotypes and heterogeneous patient populations. First, I created an NLP pipeline by finetuning Transformer language models to read, understand, and extract critical epilepsy outcome measures - seizure freedom, seizure frequency, and date of last seizure, from unstructured note text; this pipeline was found to rival trained humans in this task. I further tested the generalizability of these models in new clinical contexts. Using these models, I extracted seizure outcomes from the EHR in our health system. I used this data to closely study long-term seizure dynamics of patients with epilepsy, finding that the majority of them experienced periods of seizure freedom interspersed with epileptic episodes. I also used this data to both investigate demographic biases in transformer models, and elucidate how seizure outcomes were influenced by demographic factors; I found a lack of evidence of model bias, and that female patients, patients on public insurance, and patients from lower-income zip-codes fare substantially worse than their counterparts. Finally, I conducted a large-scale retrospective comparative effectiveness trial of anti-seizure medications using a rigorous causal inference and statistical framework. The results of this thesis demonstrate that NLP can unlock the information stored in the EHR to conduct clinical informatics research at scale
- Notes:
- Source: Dissertations Abstracts International, Volume: 86-07, Section: B.
- Advisors: Litt, Brian Committee members: Johnson, Kevin B.; Ellis, Colin A.; Roth, Dan
- Ph.D. University of Pennsylvania 2024
- Local Notes:
- School code: 0175
- ISBN:
- 9798302184467
- Access Restriction:
- Restricted for use by site license
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.