My Account Log in

1 option

Topics in Generalized Correlation Analysis and Clustering / Sheng Gao.

Dissertations & Theses @ University of Pennsylvania Available online

View online
Format:
Book
Thesis/Dissertation
Author/Creator:
Gao, Sheng, author.
Contributor:
University of Pennsylvania. Statistics, degree granting institution.
Language:
English
Subjects (All):
Statistics.
Bioinformatics.
Computer science.
Statistics--Penn dissertations.
Penn dissertations--Statistics.
Local Subjects:
Statistics.
Bioinformatics.
Computer science.
Statistics--Penn dissertations.
Penn dissertations--Statistics.
Physical Description:
1 online resource (147 pages)
Distribution:
Ann Arbor : ProQuest Dissertations & Theses, 2023
Contained In:
Dissertations Abstracts International 84-12B.
Place of Publication:
[Philadelphia, Pennsylvania] : University of Pennsylvania, 2022.
Language Note:
English
Summary:
In the era of big data, generating large volumes of data from multiple sources on a shared group of subjects has become increasingly prevalent. The availability of abundant computational resources and advancements in data acquisition technology have made the integration of information from multimodal measurements essential. The objective of this integration is to develop efficient algorithms that facilitate a deeper understanding of shared subjects, despite variations in the contexts of multimodal information. This dissertation explores multimodal data analysis from the standpoints of algorithms, theories, and applications in various fields. It consists of three main components. The first part of the dissertation studies the topic of sparse generalized correlation analysis (sparse GCA). We first formulate sparse GCA as generalized eigenvalue problems at both population and sample levels via a careful choice of normalization constraints. Subsequently, we present a computationally efficient algorithm for solving sparse GCA when there are potentially multiple generalized correlation tuples in data and the loading matrix has a small number of nonzero rows. We also establish the theoretical guarantees of the proposed algorithm and provide a corresponding information-theoretic lower bound for estimating GCA loading matrices. In the second part of the dissertation, we delve deeper into the application of sparse GCA on multimodal datasets. We develop a modified algorithm for solving sparse GCA in a layerwise fashion when the row sparsity condition is violated. Utilizing a nested cross-validation procedure, we apply the layerwise sparse GCA to the Philadelphia Neurodevelopmental Cohort (PNC) study. This enables us to reveal the correlation structure of covariates across multiple datasets, encompassing neuroimaging, a wide range of clinical and cognitive phenotypes, and demographic information. In the third part of the dissertation, we study the problem of cell type clustering with multimodal information. We introduce CellSNAP, a clustering pipeline that integrates feature expression, cellular neighborhood, and local tissue-level morphology information to generate a novel embedding that combines these three types of information. To showcase the effectiveness of CellSNAP, we apply it to the murine spleen dataset, which comprises multimodal measurements on single-cell data.
Notes:
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
Advisors: Ma, Zongming; Committee members: Su, Weijie; Chen, Yuxin.
Department: Statistics.
Ph.D. University of Pennsylvania 2023.
Local Notes:
School code: 0175
ISBN:
9798379755720
Access Restriction:
Restricted for use by site license.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account