2 options
Statistical methods for phenotyping with positive-only electronic health record data / Lingjiao Zhang.
Connect to full text Available online
View online- Format:
- Book
- Thesis/Dissertation
- Author/Creator:
- Zhang, Lingjiao, author.
- Language:
- English
- Subjects (All):
- Biostatistics.
- Medicine.
- Statistics.
- Health care management.
- Information science.
- Epidemiology.
- Epidemiology and Biostatistics--Penn dissertations.
- Penn dissertations--Epidemiology and Biostatistics.
- Local Subjects:
- Biostatistics.
- Medicine.
- Statistics.
- Health care management.
- Information science.
- Epidemiology.
- Epidemiology and Biostatistics--Penn dissertations.
- Penn dissertations--Epidemiology and Biostatistics.
- Genre:
- Academic theses.
- Physical Description:
- 1 online resource (92 pages)
- Contained In:
- Dissertations Abstracts International 83-03B.
- Place of Publication:
- [Philadelphia, Pennsylvania] : University of Pennsylvania ; Ann Arbor : ProQuest Dissertations & Theses, 2020.
- Language Note:
- English
- System Details:
- Mode of access: World Wide Web.
- text file
- Summary:
- Electronic Health Records-based phenotyping requires fully labeled cases and controls for model training and testing. Due to asymmetric clinical workflow, labeled cases can be much more easily identified than labeled controls. Therefore, data from a group of labeled cases and a large number of unlabeled patients, referred to as "positive-only" data, is frequently accessible with minimum requirement for labeling efforts. This dissertation focuses on statistical methods for training and validating phenotyping models using such positive-only EHR data when the labeled cases can be seen as a representative subset of all cases. In project I, we developed an anchor-variable framework and proposed an accompanying maximum likelihood approach to training a logistic phenotyping model. In project II, we developed a Chi-squared test to assess model calibration through comparing the model-free and model-based estimated number of cases among the unlabeled. We also proposed consistent estimators for predictive performance measures and studied their large sample properties. These methods provide the methodological foundation for positive-only data to be routinely used for training and validating phenotyping models. In project III, we extended the MLE method in project I to accommodate high dimensional predictors by enabling automated feature selection through a proxy phenotype that is available for all patients. We performed extensive simulation studies to assess the performance of the proposed methods and applied them to Penn Medicine EHR data to phenotype primary aldosteronism.
- Notes:
- Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
- Advisors: Chen, Jinbo; Committee members: Lee, Hongzhe; Landis, J. Richard; Hubbard, Rebecca A.; Herman, Daniel S.
- Department: Epidemiology and Biostatistics.
- Ph.D. University of Pennsylvania 2020.
- Local Notes:
- School code: 0175
- ISBN:
- 9798535568256
- Access Restriction:
- Restricted for use by site license.
- This item is not available from ProQuest Dissertations & Theses.
- This item must not be sold to any third party vendors.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.