1 option
Text mining : a guidebook for the social sciences / Gabe Ignatow, University of North Texas, Rada Mihalcea, University of Michigan.
LIBRA H61.3 .I395 2017
Available from offsite location
- Format:
- Book
- Author/Creator:
- Ignatow, Gabe, author.
- Radev, Dragomir, 1968- author.
- Language:
- English
- Subjects (All):
- Social sciences--Research--Methodology.
- Social sciences.
- Discourse analysis--Data processing.
- Discourse analysis.
- Communication--Network analsysis.
- Communication.
- Natural language processing (Computer science).
- Data mining.
- Physical Description:
- xvi, 188 pages : illustrations ; 23 cm
- Place of Publication:
- Los Angeles : SAGE, [2017]
- Summary:
- Online communities generate massive volumes of natural language data, and the social sciences continue to learn how to best make use of this new information and the technology available for analyzing it. Text Mining brings together a broad range of contemporary qualitative and quantitative methods to provide strategic and practical guidance on analyzing large text collections. This accessible book, written by a sociologist and a computer scientist, surveys the fast-changing landscape of data sources, programming languages, software packages, and methods of analysis available today. Suitable for novice and experienced researchers alike, this book helps readers use text mining techniques more efficiently and productively. Book jacket.
- Contents:
- Part I Digital Texts, Digital Social Science 1
- 1 Social Science and the Digital Text Revolution 2
- History of Text Analysis 3
- Risks and Rewards of Text Mining for the Social Sciences 5
- Social Data From Digital Environments 6
- Theory and Metatheory 10
- Ethics of Text Mining 12
- Participant Consent, Privacy, and Anonymity 12
- Prompted and Unprompted Data 13
- Organization of This Volume 13
- 2 Research Design Strategies 16
- Levels of Analysis 18
- The Textual Level 18
- The Contextual Level 18
- The Sociological Level 18
- Strategies for Document Selection and Sampling 19
- Case Selection 19
- Text Sampling 20
- Types of Inferential Logic 22
- Inductive Logic 23
- Deductive Logic 24
- Abductive Logic 25
- Approaches to Research Design 27
- Analysis of Discourse Positions 27
- Conversation Analysis 28
- Critical Discourse Analysis 28
- Content Analysis 29
- Foucauldian Intertextuality 30
- Analysis of Texts as Social Information 31
- Part II Text Mining Fundamentals 33
- 3 Web Crawling and Scraping 34
- Web Statistics 36
- Web Crawling 37
- Process Steps in Crawling 37
- Traversal Strategies 38
- Crawler Politeness 38
- Web Scraping 39
- Software for Web Crawling and Scraping 41
- 4 Lexical Resources 42
- WordNet 43
- WordNet-Affect 45
- Roget's Thesaurus 46
- Linguistic Inquiry and Word Count 46
- General Inquirer 48
- Wikipedia 48
- Wiktionary 51
- Downloadable Lexical Resources and Application Program Interfaces 51
- 5 Basic Text Processing 52
- Tokenization 54
- Stop Word Removal 55
- Stemming and Lemmatjzation 55
- Text Statistics 56
- Language Models 59
- Other Text Processing 60
- Part of Speech Tagging 60
- Collocation identification 60
- Syntactic Parsing 61
- Named Entity Tagging 61
- Word Sense Disambiguation 61
- Software for Text Processing 61
- 6 Supervised Learning 62
- Feature Representation and Weighting 65
- Feature Weighting 65
- Supervised Learning Algorithms 66
- Decision Trees 67
- Instance-Based Learning 68
- Support Vector Machines 69
- Evaluation of Supervised Learning 71
- Software for Supervised Learning 71
- Part III Text Analysis Methods from the Humanities and Social Sciences 73
- 7 Thematic Analysis, Qualitative Data Analysis Software, and Visualization 74
- Thematic Analysis 75
- Qualitative Data Analysis Software 77
- Visualization Tools 83
- Word Clouds 84
- Word Trees and Phrase Nets 84
- Matrices and Maps 85
- Key Word in Context 86
- Software for Thematic Analysis, Qualitative Data Analysis and Visualization 86
- 8 Narrative Analysis 88
- Conceptual Foundations 90
- Structural Approaches to Narrative 90
- Functionalist Approaches to Narrative 91
- Sociological Approaches to Narrative 92
- Mixed Methods of Narrative Analysis 92
- Automated Methods of Narrative Analysis 93
- Future Directions 93
- Software for Narrative Analysis 94
- 9 Metaphor Analysis 96
- Theoretical Foundations 98
- Qualitative Metaphor Analysis 99
- Anthropology 99
- Educational Research 99
- Political Science 100
- Psychology 100
- Sociology 101
- Mixed Methods of Metaphor Analysis 101
- Management Research 101
- Psychology 102
- Sociology 102
- Automated Metaphor Identification Methods 103
- Software for Metaphor Analysis 103
- Part IV Text Mining Methods from Computer Science 105
- 10 Word and Text Relatedness 106
- Theoretical Foundations 107
- Corpus-Based and Knowledge-Based Measures of Relatedness 108
- Corpus-Based Measures of Word Relatedness 108
- Knowledge-Based Measures of Word Relatedness 110
- Measures of Text Relatedness 112
- Software and Data Sets for Word and Text Relatedness 114
- 11 Text Classification 116
- A Brief History of Text Classification 118
- Applications of Text Classification 119
- Topic Classification 119
- E-Mail Spam Detection 120
- Sentiment Analysis/Opinion Mining 120
- Gender Classification 120
- Deception Detection 122
- Other Applications 122
- Representing Texts for Supervised Text Classification 122
- Feature Weighting and Selection 123
- Text Classification Algorithms 124
- Naive Bayes 124
- Rocchio Classifier 125
- Bootstrapping in Text Classification 126
- Evaluation of Text Classification 127
- Software and Data Sets for Text Classification 127
- 12 Information Extraction 130
- Entity Extraction 132
- Relation Extraction 133
- Web Information Extraction 134
- Template Filling 135
- Software and Data Sets for Information Extraction and Text Mining 135
- 13 Information Retrieval 136
- Theoretical Foundations 138
- Components of an Information Retrieval System 138
- Information Retrieval Models 140
- The Vector Space Model 142
- Evaluation of Information Retrieval Models 144
- Web-Based Information Retrieval 145
- Software and Data Sets for Information Retrieval 147
- 14 Sentiment Analysis 148
- Theoretical Foundations 150
- Lexicons 151
- Corpora 152
- Tools 153
- Software and Data Sets for Sentiment Analysis 154
- 15 Topic Models 156
- Digital Humanities 160
- Political Science 160
- Sociology 161
- Software far Topic Modeling 161
- Part V Conclusions 163
- 16 Text Mining, Text Analysis, and the Future of Social Science 164
- Social and Computer Science Collaboration 166.
- Notes:
- Includes bibliographical references (pages 168-182) and index.
- Local Notes:
- Acquired for the Penn Libraries with assistance from the Esther F. Kantrowitz & Lionel Kantrowitz Collection Endowment Fund.
- ISBN:
- 9781483369341
- 148336934X
- OCLC:
- 933765455
- Publisher Number:
- 99968910078
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.