2 options
Problems in high-dimensional statistics and applications in genomics, metabolomics and microbiomics / Rong Ma.
- Format:
- Book
- Thesis/Dissertation
- Author/Creator:
- Ma, Rong, author.
- Language:
- English
- Subjects (All):
- Biostatistics.
- Statistics.
- Genetics.
- Sparsity.
- Simulation.
- Growth models.
- Regression analysis.
- Hypothesis testing.
- Signal to noise ratio.
- Hypotheses.
- Autoimmune diseases.
- Generalized linear models.
- Inflammatory bowel disease.
- Taxonomy.
- Numerical analysis.
- Methods.
- Confidence intervals.
- Metabolites.
- Pediatrics.
- Bias.
- Epidemiology and Biostatistics--Penn dissertations.
- Penn dissertations--Epidemiology and Biostatistics.
- Local Subjects:
- Biostatistics.
- Statistics.
- Genetics.
- Sparsity.
- Simulation.
- Growth models.
- Regression analysis.
- Hypothesis testing.
- Signal to noise ratio.
- Hypotheses.
- Autoimmune diseases.
- Generalized linear models.
- Inflammatory bowel disease.
- Taxonomy.
- Numerical analysis.
- Methods.
- Confidence intervals.
- Metabolites.
- Pediatrics.
- Bias.
- Epidemiology and Biostatistics--Penn dissertations.
- Penn dissertations--Epidemiology and Biostatistics.
- Genre:
- Academic theses.
- Physical Description:
- 1 online resource (142 pages)
- Contained In:
- Dissertations Abstracts International 83-03B.
- Place of Publication:
- [Philadelphia, Pennsylvania] : University of Pennsylvania ; Ann Arbor : ProQuest Dissertations & Theses, 2021.
- Language Note:
- English
- System Details:
- Mode of access: World Wide Web.
- text file
- Summary:
- With rapid technological advancements in data collection and processing, massive large-scale and complex datasets are widely available nowadays in diverse research fields such as genomics, metabolomics and microbiomics. The analysis of large datasets with complex structures poses significant challenges and calls for new theory and methodology. In this dissertation, we address several high-dimensional statistical problems, and develop novel statistical theory and methods for analyzing datasets generated from such data-driven interdisciplinary research. In the first part of the dissertation (Chapter 1 and Chapter 2), motivated by the ubiquitous availability of high-dimensional datasets with binary outcomes and the need of powerful methods for analyzing them, we develop novel bias-correction techniques for inferring low-dimensional components or functionals of high-dimensional objects, and propose computationally efficient procedures for parameter estimation, global and simultaneous hypotheses testing, and confidence intervals in high-dimensional logistic regression(s). The theoretical properties of the proposed methods, including their minimax optimality, are carefully studied. We show empirically the effectiveness and stability of our methods in extracting useful information from high-dimensional noisy datasets. By applying our methods to a real metabolomic dataset, we unveil the associations between fecal metabolites and pediatric Crohn's disease as well as the effects of dietary treatment on such associations (Chapter 1); by analyzing a real genetic dataset, we obtain novel insights about the shared genetic architecture between ten pediatric autoimmune diseases (Chapter 2). In the second part of the dissertation (Chapter 3 and Chapter 4), motivated by important questions in large-scale human microbiome and metagenomic research, as well as other applications, we propose a novel permuted monotone matrix model and build up new principles, theories and methods for inferring the underlying model parameters. In particular, we focus on two interrelated problems, namely, optimal permutation recovery from noisy observations (Chapter 3), and extreme value estimation in permuted low-rank monotone matrices (Chapter 4), and propose an efficient spectral approach to attack these problems. The proposed methods are rigorously justified by statistical theory, including their convergence rates and the minimax optimality. Numerical experiments through simulated and synthetic microbiome metagenomic data are presented to show the superiority of the proposed methods over the alternatives. The methods are applied to two real datasets to compare the growth rates of gut bacteria between inflammatory bowel disease patients and/or normal controls.
- Notes:
- Source: Dissertations Abstracts International, Volume: 83-03, Section: B.
- Advisors: Cai, Tony; Li, Hongzhe; Committee members: Li, Mingyao; Ian, Barnett; Brown, Christopher.
- Department: Epidemiology and Biostatistics.
- Ph.D. University of Pennsylvania 2021.
- Local Notes:
- School code: 0175
- ISBN:
- 9798535569659
- Access Restriction:
- Restricted for use by site license.
- This item must not be sold to any third party vendors.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.