My Account Log in

1 option

Extracting Insights From Electronic Health Records Using Optimized Large Language Models Kevin Xie

Dissertations & Theses @ University of Pennsylvania Available online

View online
Format:
Book
Thesis/Dissertation
Author/Creator:
Xie, Kevin, author.
Contributor:
University of Pennsylvania. Bioengineering., degree granting institution.
Language:
English
Subjects (All):
Bioengineering.
Bioinformatics.
Computer science.
0202.
0715.
0800.
0984.
0769.
Local Subjects:
Bioengineering.
Bioinformatics.
Computer science.
0202.
0715.
0800.
0984.
0769.
Physical Description:
1 electronic resource (185 pages)
Contained In:
Dissertations Abstracts International 86-07B
Place of Publication:
Ann Arbor : ProQuest Dissertations and Theses, 2024
Language Note:
English
Summary:
The Electronic Health Record (EHR) contains extensive patient clinical information, including demographic and socioeconomic information; laboratory, imaging and diagnostic results; treatment plans; and comprehensive records of patient medical histories. This wealth of information makes the EHR especially suitable for retrospective studies by allowing clinicians and researchers to draw new conclusions, potentially at reduced cost, by looking backwards through time across information gathered during patient-healthcare interactions. However, the most valuable information is captured within unstructured free-text clinical notes, precluding simple data mining methods and instead favoring time-consuming and expensive manual chart review. To address this gap, I developed a Natural Language Processing (NLP) approach that uses modern techniques to drive large-scale retrospective clinical informatics research through the EHR. I demonstrate these techniques on Epilepsy, a neurological disorder with complex phenotypes and heterogeneous patient populations. First, I created an NLP pipeline by finetuning Transformer language models to read, understand, and extract critical epilepsy outcome measures - seizure freedom, seizure frequency, and date of last seizure, from unstructured note text; this pipeline was found to rival trained humans in this task. I further tested the generalizability of these models in new clinical contexts. Using these models, I extracted seizure outcomes from the EHR in our health system. I used this data to closely study long-term seizure dynamics of patients with epilepsy, finding that the majority of them experienced periods of seizure freedom interspersed with epileptic episodes. I also used this data to both investigate demographic biases in transformer models, and elucidate how seizure outcomes were influenced by demographic factors; I found a lack of evidence of model bias, and that female patients, patients on public insurance, and patients from lower-income zip-codes fare substantially worse than their counterparts. Finally, I conducted a large-scale retrospective comparative effectiveness trial of anti-seizure medications using a rigorous causal inference and statistical framework. The results of this thesis demonstrate that NLP can unlock the information stored in the EHR to conduct clinical informatics research at scale
Notes:
Source: Dissertations Abstracts International, Volume: 86-07, Section: B.
Advisors: Litt, Brian Committee members: Johnson, Kevin B.; Ellis, Colin A.; Roth, Dan
Ph.D. University of Pennsylvania 2024
Local Notes:
School code: 0175
ISBN:
9798302184467
Access Restriction:
Restricted for use by site license

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account