2 options

Community membership testing and missing value imputation : theory and methods / Yezheng Li.

Online

Available online

Dissertations & Theses @ University of Pennsylvania Available online

Format:: Book; Thesis/Dissertation
Author/Creator:: Li, Yezheng, author.
Contributor:: Li, Hongzhe, degree supervisor.; University of Pennsylvania. Department of Applied Mathematics and Computational Science, degree granting institution.
Language:: English
Subjects (All):: Applied mathematics.; Artificial intelligence.; Genetics.; Applied mathematics and computational science--Penn dissertations.; Penn dissertations--Applied mathematics and computational science.
Local Subjects:: Applied mathematics.; Artificial intelligence.; Genetics.; Applied mathematics and computational science--Penn dissertations.; Penn dissertations--Applied mathematics and computational science.
Genre:: Academic theses.
Physical Description:: 1 online resource (141 pages)
Contained In:: Dissertations Abstracts International 82-07B.
Place of Publication:: [Philadelphia, Pennsylvania] : University of Pennsylvania ; Ann Arbor : ProQuest Dissertations & Theses, 2020.
Language Note:: English
System Details:: Mode of access: World Wide Web.; text file
Summary:: Modern machine learning methods have been widely applied in genomics and metagenomics data analysis. This dissertation focuses on the area of unsupervised machine learning and discusses community membership testing, matrix completion and generative adversarial nets with applications to several problems in genomics. While analysis of singular subspace based on principal component analysis has a long history, the first chapter focuses on recent theory of statistical distribution of singular subspace in the setting of weighted stochastic block models. The theoretical results lead to statistical distribution of a test statistic in two-sample test of membership assignments.Chapter two of this dissertation deals with the problem of estimating the bacterial composition based on sparse count data, where a nuclear-norm penalized likelihood estimation based on a multinomial model is proposed in order to estimate the centered log-ratio (CLR) matrix. An efficient optimization algorithm using the generalized accelerated proximal gradient is developed. In microbiome studies, CLR transformation is most commonly used after bacterial composition is estimated from the sequencing read counts for downstream statistical analysis. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which makes CLR transformation infeasible. Our method estimates the CLR transformation directly taking into account its low-rank property. Theoretical upper bounds are established and simulation studies and real data study demonstrate that the proposed estimator outperforms the naive estimators.
Notes:: Source: Dissertations Abstracts International, Volume: 82-07, Section: B.; Advisors: LI, Hongzhe; Committee members: Edgar Dobriban; Zongming Ma.; Department: Applied Mathematics and Computational Science.; Ph.D. University of Pennsylvania 2020.
Local Notes:: School code: 0175
ISBN:: 9798557052047
Access Restriction:: Restricted for use by site license.; This item must not be sold to any third party vendors.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

2 options

Community membership testing and missing value imputation : theory and methods / Yezheng Li.

My Account

Guides