My Account Log in

1 option

Statistical methods for human microbiome data analysis / Jun Chen.

LIBRA R001 2012 .C514
Loading location information...

Available from offsite location This item is stored in our repository but can be checked out.

Log in to request item
Format:
Book
Manuscript
Thesis/Dissertation
Author/Creator:
Chen, Jun.
Contributor:
Li, Hongzhe, advisor.
Li, Mingyao, committee member.
Bushman, Frederic, committee member.
Wang, Lisan, committee member.
Zhang, Nancy, committee member.
University of Pennsylvania. Genomics and Computational Biology.
Language:
English
Subjects (All):
Penn dissertations--Genomics and computational biology.
Genomics and computational biology--Penn dissertations.
Local Subjects:
Penn dissertations--Genomics and computational biology.
Genomics and computational biology--Penn dissertations.
Physical Description:
xi, 107 pages : illustrations (some color) ; 29 cm
Production:
2012.
Summary:
The human microbiome is the totality of the microbes, their genetic elements and the interactions they have with surrounding environments throughout the human body. Studies have implicated the human microbiome in health and disease. Two central themes of human microbiome studies are to identify potential factors influencing the microbiome composition, and to define the relationship between microbiome features and biological or clinical outcomes. With the development of next generation sequencing technologies, the human microbiome composition can be interrogated using high-throughput DNA sequencing. One strategy sequences the bacterial 16S ribosomal RNA gene for species identification. These 16S sequences are usually clustered into Operational Taxonomic Units (OTUs). Analysis of such OTU data raises several important statistical challenges, including taking into account the phylogenetic relationship among OTUs and modeling high-dimensional overdispersed count data. This dissertation presents three statistical methods developed specifically for 16S data analysis centering around the two themes. To test the association between overall microbiome composition and a covariate/an outcome, a testing procedure based on a generalized UniFrac distance was developed. The generalized UniFrac distance corrects the unduly weighting of classic UniFrac distances on either highly abundant or rare lineages, and was shown to be more powerful than the classic UniFracs. Under the framework of canonical correlation analysis (CCA), a structure-constrained sparse CCA was proposed to select the OTUs and their correlated covariates. A phylogenetic structure-constrained penalty function was imposed to induce certain smoothness on the linear coefficients according to the OTU phylogenetic relationship. Structure-constrained sparse CCA performed much better than sparse CCA in selecting relevant OTUs. Finally, a sparse Dirichlet-multinomial regression (SDMR) model was developed to link the microbiome composition to environmental covariates and to select the most important covariates and their affected OTUs. SDMR accounts for the overdispersion of OTU counts and uses a sparse group l1 penalty function to facilitate selection of covariates and OTUs simultaneously. These methods were illustrated using simulations as well as a real human gut microbiome data set from a study of dietary effects on gut microbiome composition.
Notes:
Adviser: Hongzhe Li.
Thesis (Ph.D. in Genomics and Computational Biology) -- University of Pennsylvania, 2012.
Includes bibliographical references.
OCLC:
818412311

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account