1 option
Statistical Methods for Modeling Complex Dependency Structures in Zero-Inflated Metagenomic Sequencing Data / Rebecca Ann Deek.
- Format:
- Book
- Thesis/Dissertation
- Author/Creator:
- Deek, Rebecca Ann, author.
- Language:
- English
- Subjects (All):
- Biostatistics.
- Statistics.
- Epidemiology.
- Epidemiology and Biostatistics--Penn dissertations.
- Penn dissertations--Epidemiology and Biostatistics.
- Local Subjects:
- Biostatistics.
- Statistics.
- Epidemiology.
- Epidemiology and Biostatistics--Penn dissertations.
- Penn dissertations--Epidemiology and Biostatistics.
- Physical Description:
- 1 online resource (115 pages)
- Distribution:
- Ann Arbor : ProQuest Dissertations & Theses, 2023
- Contained In:
- Dissertations Abstracts International 84-12B.
- Place of Publication:
- [Philadelphia, Pennsylvania] : University of Pennsylvania, 2022.
- Language Note:
- English
- Summary:
- Advances in high-throughput sequencing technologies have enabled large-scale metagenomic sequencing studies of microbial compositions. As such, there is a growing scientific interest in understanding the human microbiome, defined as all the microorganisms and their genes in, or on, the body. Of particular interest is its functional role in human-host health. Nevertheless, there remains a statistical and computational bottleneck in effectively analyzing data from 16S rRNA and metagenomic sequencing studies. This is due to the characteristic excessive zeros, sequencing depth constraints, and high dimensionality of such data. Motivated by numerous microbiome studies, this dissertation aims to narrow the gap by developing novel statistical methods specifically designed to capture the excessive zeros of the data. The specific aims are to develop statistical models, inference procedures, and computational fast algorithms to (1) identify distinct microbial communities in a given data set, as well as each community's important bacterial taxa, and (2) build microbial covariation networks based upon the estimated covariation between a pair of zero-inflated variables. To this end, three methodological advances are proposed. First, a generative latent mixture model of microbial counts that distinguishes between structural and sampling zeros. Second, a mixture margin copula model and two-stage inference procedure for microbial covariation networks in cross-sectional studies. Third, an extension to random-effects mixture margin copula models, as well as a corresponding Monte Carlo EM algorithm and likelihood ratio test to build temporally conserved covariation networks from longitudinal data. Furthermore, the performance and utility of these methods are demonstrated using simulations and several publicly available microbiome data sets.
- Notes:
- Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
- Advisors: Li, Hongzhe; Committee members: Li, Mingyao; Huang, Jing; Collman, Ronald G.
- Department: Epidemiology and Biostatistics.
- Ph.D. University of Pennsylvania 2023.
- Local Notes:
- School code: 0175
- ISBN:
- 9798379758516
- Access Restriction:
- Restricted for use by site license.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.