1 option

Human-Centered AI in Computational Social Science: Evaluating Automated Annotation with Large Language Models Nicholas James Pangakis

Dissertations & Theses @ University of Pennsylvania Available online

Format:: Book; Thesis/Dissertation
Author/Creator:: Pangakis, Nicholas James, author.
Contributor:: University of Pennsylvania. Political Science., degree granting institution.
Language:: English
Subjects (All):: Political science.; Computer science.; 0615.; 0984.; 0800.
Local Subjects:: Political science.; Computer science.; 0615.; 0984.; 0800.
Physical Description:: 1 electronic resource (147 pages)
Contained In:: Dissertations Abstracts International 86-12B
Place of Publication:: Ann Arbor : ProQuest Dissertations and Theses, 2025
Language Note:: English
Summary:: Computational social scientists are increasingly incorporating text as data into their research. A typical framework for working with large text data sets involves hiring human annotators to read a subset of the text samples and then building a statistical model to annotate the remainder of the text corpus. Due to their effectiveness at quantifying natural language, their ease of application, and their relatively low cost, artificial intelligence tools, like generative large language models (LLMs), may be used to automate these manual annotation procedures. This process, which I call "automated annotation," can dramatically improve research designs that involve text as data. For example, I demonstrate that automated annotation procedures can cost 11.6% that of standard annotation approaches and take 18.8% the time. Although automated annotation has remarkable potential in social science, there are serious concerns about misuse and uncritical application. If practitioners use automated annotation without validation, for instance, they risk unknown bias and other inaccuracies in downstream applications. Thus, my dissertation aims to test strategies to develop effective and responsible automated annotation procedures. Specifically, I argue for a human-centered automated annotation framework, which places a central role for human annotations at each stage of the workflow. Across three studies, I develop and implement various automated annotation techniques that all remain grounded in human reasoning. My empirical investigations cover a wide range of topics-from testing automated annotation strategies with generative LLMs to developing a multi-stage, human-in-the-loop annotation pipeline. As a whole, my findings underscore the potential of leveraging AI tools to enhance text-as-data methodologies and to help researchers explore important substantive questions. With proper validation techniques, generative LLMs can approximate human reasoning at a rapid pace and low cost
Notes:: Source: Dissertations Abstracts International, Volume: 86-12, Section: B.; Advisors: Hopkins, Daniel Committee members: Gillion, Daniel; Lelkes, Yphtach; Ph.D. University of Pennsylvania 2025
Local Notes:: School code: 0175
ISBN:: 9798280760547
Access Restriction:: Restricted for use by site license

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

Human-Centered AI in Computational Social Science: Evaluating Automated Annotation with Large Language Models Nicholas James Pangakis

Find

My Account

Guides