1 option
Topic Modeling: Optimal Estimation, Statistical Inference, and Beyond / Ruijia Wu.
Dissertations & Theses @ University of Pennsylvania Available online
Dissertations & Theses @ University of Pennsylvania- Format:
- Book
- Thesis/Dissertation
- Author/Creator:
- Wu, Ruijia, author.
- Language:
- English
- Subjects (All):
- Statistics.
- Statistics--Penn dissertations.
- Penn dissertations--Statistics.
- Local Subjects:
- Statistics.
- Statistics--Penn dissertations.
- Penn dissertations--Statistics.
- Physical Description:
- 1 online resource (162 pages)
- Distribution:
- Ann Arbor : ProQuest Dissertations & Theses, 2022
- Contained In:
- Dissertations Abstracts International 84-02B.
- Place of Publication:
- [Philadelphia, Pennsylvania] : University of Pennsylvania, 2022.
- Language Note:
- English
- Summary:
- With the development of computer technology and the internet, increasingly large amounts of textual data are generated and collected every day. It is a significant challenge to analyze and extract meaningful and actionable information from vast amounts of unstructured textual data. This thesis explores several problems in topic modelings and provides new algorithms with theoretical guarantees. The first part of this thesis aims to develop an optimality theory for unsupervised topic modeling under the probabilistic latent semantic indexing (pLSI) model. Novel and computationally fast algorithms for estimation and inference of both the word-topic matrix and the topic-document matrix are proposed and their theoretical properties are investigated. Moreover, a refitting algorithm is proposed to establish asymptotic normality and construct valid confidence intervals for the individual entries of the word-topic and topic-document matrices. In the second part, we study supervised topic modeling, which jointly considers a collection of documents and their paired side information. To take account of the compositional nature of the topic-document matrix, we adapt the log-contrast model and introduce a novel bias-adjusted algorithm to investigate the regression coefficients in the generalized linear model. In addition, a de-biased procedure is proposed to establish an asymptotically unbiased and normally distributed estimator, and hence valid confidence intervals are constructed for the individual entries of regression coefficients. We also investigate the errors-in-variables models under the generalized linear model framework in the third part. We proposed an estimator when the measurement error is small.
- Notes:
- Source: Dissertations Abstracts International, Volume: 84-02, Section: B.
- Advisors: Cai, T. Tony; Committee members: Li, Hongzhe; Small, Dylan S.; Su, Weijie.
- Department: Statistics.
- Ph.D. University of Pennsylvania 2022.
- Local Notes:
- School code: 0175
- ISBN:
- 9798837503467
- Access Restriction:
- Restricted for use by site license.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.