2 options
Low-resource named entity recognition / Stephen Mayhew.
Dissertations & Theses @ University of Pennsylvania Available online
Dissertations & Theses @ University of Pennsylvania- Format:
- Book
- Thesis/Dissertation
- Author/Creator:
- Mayhew, Stephen, author.
- Language:
- English
- Subjects (All):
- Artificial intelligence.
- Computer and Information Science--Penn dissertations.
- Penn dissertations--Computer and Information Science.
- Local Subjects:
- Artificial intelligence.
- Computer and Information Science--Penn dissertations.
- Penn dissertations--Computer and Information Science.
- Genre:
- Academic theses.
- Physical Description:
- 1 online resource (159 pages)
- Contained In:
- Dissertations Abstracts International 81-10B.
- Place of Publication:
- [Philadelphia, Pennsylvania] : University of Pennsylvania ; Ann Arbor : ProQuest Dissertations & Theses, 2019.
- Language Note:
- English
- System Details:
- Mode of access: World Wide Web.
- text file
- Summary:
- Most of the success in natural language processing (NLP) in the last 20 years has come from statistical machine learning methods that discover complex patterns in text and make predictions. These methods traditionally require supervised data, which is nearly always created by humans, as a gold standard for that task. But as we look to extend these successes to other languages, we are faced with the daunting task of starting from scratch. The years of effort that went into creating annotations for English and a select few popular languages must be relived for each new language. This unrealistic requirement means that as we seek to perform old tasks in new languages we must use existing resources, or rapidly develop new resources.In particular, we study the problem of Named Entity Recognition (NER) in low resource languages. The task of NER is to find and classify names in text, and the low-resource qualifier signifies that we build these models without access to training data. This thesis discusses the use of incidental signals for developing NER systems, such as character sequences indicative of named entities, or partially-annotated text, such as might come from non-speaker annotations. It describes new methods for cross-lingual NER, exploiting such resources as Wikipedia and bilingual lexicons. The penultimate chapter applies several prominent techniques to a broad array of test languages, giving valuable insights into what has been accomplished, and what is left to do. The final chapter distils knowledge from several years of experience building low-resource NER systems into a practical guide.
- Notes:
- Source: Dissertations Abstracts International, Volume: 81-10, Section: B.
- Advisors: Roth, Dan; Committee members: Mitch Marcus; Chris Callison-Burch; Benjamin Van Durme; Mark Liberman.
- Department: Computer and Information Science.
- Ph.D. University of Pennsylvania 2019.
- Local Notes:
- School code: 0175
- ISBN:
- 9798607316617
- Access Restriction:
- Restricted for use by site license.
- This item must not be sold to any third party vendors.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.