1 option
Improving text classification with boolean retrieval for rare categories : a case study identifying firearm violence conversations in the Crisis Text Line database / Robert F. Chew.
- Format:
- Book
- Author/Creator:
- Chew, Robert F., author.
- Series:
- RTI Press methods report.
- RTI Press methods report
- Language:
- English
- Subjects (All):
- Data mining.
- Information storage and retrieval systems.
- Physical Description:
- 1 online resource.
- Other Title:
- Improving text classification with boolean retrieval for rare categories
- Place of Publication:
- Research Triangle Park, NC : RTI Press, 2023.
- Summary:
- Advancements in machine learning and natural language processing have made text classification increasingly attractive for information retrieval. However, developing text classifiers is challenging when no prior labeled data are available for a rare category of interest. Finding instances of the rare class using a uniform random sample can be inefficient and costly due to the rare category's low base rate. This work presents an approach that combines the strengths of text classification and Boolean retrieval to help learn rare concepts of interest. As a motivating example, we use the task of finding conversations that reference firearm injury or violence in the Crisis Text Line database. Identifying rare categories, like firearm injury or violence, can improve crisis lines' abilities to support people with firearm-related crises or provide appropriate resources. Our approach outperforms a set of iteratively refined Boolean queries and results in a recall of 0.91 on a test set generated from a process independent of our study. Our results suggest that text classification with Boolean retrieval initialization can be effective for finding rare categories of interest and improve on the precision of using Boolean retrieval alone.
- Notes:
- Description based on publisher supplied metadata and other sources.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.