My Account Log in

1 option

Multi-Level Methods for Estimating Community Language from Social Media with User and Community Sociodemographics / Salvatore Giorgi.

Dissertations & Theses @ University of Pennsylvania Available online

View online
Format:
Book
Thesis/Dissertation
Author/Creator:
Giorgi, Salvatore, author.
Contributor:
University of Pennsylvania. Computer and Information Science, degree granting institution.
Language:
English
Subjects (All):
Computer science.
Web studies.
Information science.
Computer and Information Science--Penn dissertations.
Penn dissertations--Computer and Information Science.
Local Subjects:
Computer science.
Web studies.
Information science.
Computer and Information Science--Penn dissertations.
Penn dissertations--Computer and Information Science.
Physical Description:
1 online resource (145 pages)
Distribution:
Ann Arbor : ProQuest Dissertations & Theses, 2023
Contained In:
Dissertations Abstracts International 85-08A.
Place of Publication:
[Philadelphia, Pennsylvania] : University of Pennsylvania, 2022.
Language Note:
English
Summary:
Nowcasting based on social media text promises to provide unobtrusive near real-time predictions of community-level outcomes ranging from subjective well-being and physical health to personality and opioid use. Early methods for predicting outcomes from community-level language, e.g., Twitter, tended to (1) focus on keyword-driven analyses, where manually selected sets of words were examined for their ability to predict real-world outcomes (i.e., the community's use of the word "opioids" on Twitter to predict opioid poisoning mortality), and (2) lacked a person-centered focus, largely ignoring the fact that communities are groups of individuals who may share common attributes. Furthermore, the focus is typically on prediction, where complex models are built to predict some community attribute from language instead of directly focusing on building and validating better language estimates. In this thesis, I develop and evaluate methods to estimate the language of spatial units (e.g., U.S. counties) that contextualize people within their communities and leverage the multi-level, bi-directional relationships between people and their environments. Using corpora including billions of tweets from millions of geolocated Twitter users, I (1) construct community-level features from person-level linguistic features, (2) build tunable restratification methods to remove selection biases, (3) use deep hierarchical modeling to explore relationships between people and their environments, and (4) produce state-of-the-art accuracies across community-level prediction tasks in public health, geographic psychology, and substance use. These person-centered spatial language estimates are psychometrically valid, more representative of the socio-demographic makeup of their communities, generalizable across spatial units (e.g., prefectures in Japan and U.K. local authority districts), and robust to spatial dependencies. This thesis lays the foundation for using large public corpora for population-level tasks and open up the possibility of real-time public health monitoring.
Notes:
Source: Dissertations Abstracts International, Volume: 85-08, Section: A.
Advisors: Ungar, Lyle H.; Committee members: Callison-Burch, Chris; Gardner, Jacob R.; Yatskar, Mark; Schwartz, H. Andrew.
Department: Computer and Information Science.
Ph.D. University of Pennsylvania 2023.
Local Notes:
School code: 0175
ISBN:
9798381510584
Access Restriction:
Restricted for use by site license.

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account