My Account Log in

2 options

Statistical methods for phenotyping with positive-only electronic health record data / Lingjiao Zhang.

Connect to full text Available online

View online

Dissertations & Theses @ University of Pennsylvania Available online

View online
Format:
Book
Thesis/Dissertation
Author/Creator:
Zhang, Lingjiao, author.
Contributor:
Chen, Jinbo, degree supervisor.
University of Pennsylvania. Department of Epidemiology and Biostatistics, degree granting institution.
Language:
English
Subjects (All):
Biostatistics.
Medicine.
Statistics.
Health care management.
Information science.
Epidemiology.
Epidemiology and Biostatistics--Penn dissertations.
Penn dissertations--Epidemiology and Biostatistics.
Local Subjects:
Biostatistics.
Medicine.
Statistics.
Health care management.
Information science.
Epidemiology.
Epidemiology and Biostatistics--Penn dissertations.
Penn dissertations--Epidemiology and Biostatistics.
Genre:
Academic theses.
Physical Description:
1 online resource (92 pages)
Contained In:
Dissertations Abstracts International 83-03B.
Place of Publication:
[Philadelphia, Pennsylvania] : University of Pennsylvania ; Ann Arbor : ProQuest Dissertations & Theses, 2020.
Language Note:
English
System Details:
Mode of access: World Wide Web.
text file
Summary:
Electronic Health Records-based phenotyping requires fully labeled cases and controls for model training and testing. Due to asymmetric clinical workflow, labeled cases can be much more easily identified than labeled controls. Therefore, data from a group of labeled cases and a large number of unlabeled patients, referred to as "positive-only" data, is frequently accessible with minimum requirement for labeling efforts. This dissertation focuses on statistical methods for training and validating phenotyping models using such positive-only EHR data when the labeled cases can be seen as a representative subset of all cases. In project I, we developed an anchor-variable framework and proposed an accompanying maximum likelihood approach to training a logistic phenotyping model. In project II, we developed a Chi-squared test to assess model calibration through comparing the model-free and model-based estimated number of cases among the unlabeled. We also proposed consistent estimators for predictive performance measures and studied their large sample properties. These methods provide the methodological foundation for positive-only data to be routinely used for training and validating phenotyping models. In project III, we extended the MLE method in project I to accommodate high dimensional predictors by enabling automated feature selection through a proxy phenotype that is available for all patients. We performed extensive simulation studies to assess the performance of the proposed methods and applied them to Penn Medicine EHR data to phenotype primary aldosteronism.
Notes:
Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
Advisors: Chen, Jinbo; Committee members: Lee, Hongzhe; Landis, J. Richard; Hubbard, Rebecca A.; Herman, Daniel S.
Department: Epidemiology and Biostatistics.
Ph.D. University of Pennsylvania 2020.
Local Notes:
School code: 0175
ISBN:
9798535568256
Access Restriction:
Restricted for use by site license.
This item is not available from ProQuest Dissertations & Theses.
This item must not be sold to any third party vendors.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account