2 options

Statistical methods for analysis of structured genomic data.

Online

Available online

Dissertations & Theses @ University of Pennsylvania Available online

Format:: Book; Thesis/Dissertation
Author/Creator:: Chuai, Shaokun.
Contributor:: University of Pennsylvania.
Language:: English
Subjects (All):: Bioinformatics.; Biometry.; 0308.; 0715.
Local Subjects:: 0308.; 0715.
Physical Description:: 135 pages
Contained In:: Dissertation Abstracts International 73-06B.
System Details:: Mode of access: World Wide Web.; text file
Summary:: Partially motivated by analysis of high dimensional genomic data, high dimensional statistics, especially high dimensional regression analysis, have been an active research area in the last decades. Besides high dimensionality of the genomic data, another important feature is that the genomic data often have certain structure such as time course measurements and group or graphical structures. How to incorporate such structure information into analysis of numerical data raises interesting statistical challenges. This dissertation develops statistical methods for two problems motivated by genomic data analysis. The first problem is related to variable selection for high dimensional varying coefficients models, where we develop a regularization method for variable selection and estimation. We use basis function expansion to model the time-dependent regression coefficient functions and a combination of smoothness and group-level penalty to achieve both smooth function estimation and coefficient function selection. We apply the methods for analysis of microarray time course gene expression data in order to identify the transcription factors that regulate expression changes over times. Our results show that the varying coefficients model provides better power in identifying the relevant transcription factors than simple time-wise analysis. The second problem considers variable selection for graph-structured group variables, where we assume that the variables are grouped and also have a graphical structure. Such examples include genes in a collection of pathways and single nucleotide polymorphisms (SNP) in genes. We introduce a new penalty that is a combination of group Lasso and a graph-constrained smoothness penalty within groups in order to perform both group-level variable selection and to impose some smoothness of the regression coefficients with respect to the graph structures. Simulation results have shown that the new method gives better variable selection and also prediction when such group and graphical structure information exists. We apply this method to analysis of two real data sets: an analysis of a glioblastoma gene expression data to identify several KEGG pathways that are potentially related to survival time of glioblastoma; and an analysis of a SNP data to identify genes that are associated with patient HDL level.
Notes:: Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: .; Advisers: Hongzhe Li; Wensheng Guo.; Thesis (Ph.D.)--University of Pennsylvania, 2011.
Local Notes:: School code: 0175.
ISBN:: 9781267200181
Access Restriction:: Restricted for use by site license.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

2 options

Statistical methods for analysis of structured genomic data.

Find

My Account

Guides