1 option
Statistical methods for compositional and tree-structured count data / Pixu Shi.
LIBRA R001 2016 .S5551
Available from offsite location
- Format:
- Book
- Manuscript
- Thesis/Dissertation
- Author/Creator:
- Shi, Pixu, author.
- Language:
- English
- Subjects (All):
- Penn dissertations--Epidemiology and biostatistics.
- Epidemiology and biostatistics--Penn dissertations.
- Local Subjects:
- Penn dissertations--Epidemiology and biostatistics.
- Epidemiology and biostatistics--Penn dissertations.
- Physical Description:
- ix, 83 leaves : color illustrations ; 29 cm
- Production:
- [Philadelphia, Pennsylvania] : University of Pennsylvania, 2016.
- Summary:
- In human microbiome studies, sequencing reads data are often summarized as counts of bacterial taxa at various taxonomic levels. In this thesis, we develop statistical methods for analyzing such counts data. We first consider regression analysis with bacterial counts normalized into compositions as covariates. In order to satisfy the subcompositional coherence of the resulting model, linear models with a set of linear constraints on the regression coefficients are introduced. A penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed. A method is also proposed to obtain de-biased estimates of the regression coefficients that are asymptotically unbiased and have a joint asymptotic multivariate normal distribution. This provides valid confidence intervals of the regression coefficients and can be used to obtain the p-values. Simulation results have shown the validity of the confidence intervals and smaller variances of the de-biased estimates when the linear constraints are imposed. The proposed methods are applied to a gut microbiome data set and identify four bacterial genera that are associated with the body mass index after adjusting for the total fat and caloric intakes. We then consider the problem of testing difference between two repeated measurements of microbiome from the same subjects. Multiple microbiome measurements are often obtained from the same subject to assess the difference in microbial composition across body sites or time points. Existing models for analyzing such data are limited in modeling the covariance structure of the counts and in handling paired multinomial data. We propose a new probability distribution for paired multinomial count data, which allows flexible covariance structure of the counts and can be used to model repeatedly measured multivariate counts. Based on this new distribution, a test statistic is developed to test the difference in compositions of paired multinomial count data. The proposed test can be applied to count data observed on taxonomic trees in order to test difference in microbiome compositions and to identify subtrees with different subcompositions. Simulation results shown that the proposed test has correct type 1 errors and increased power compared to some commonly used methods. An analysis of an upper respiratory tract microbiome data set is used to illustrate the proposed methods.
- Notes:
- Ph. D. University of Pennsylvania 2016.
- Department: Epidemiology and Biostatistics.
- Supervisor: Hongzhe Li.
- Includes bibliographical references.
- OCLC:
- 960101033
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.