My Account Log in

0 options

Challenges in Corpus Linguistics : Rethinking Corpus Compilation and Analysis / edited by Mark Kaunisto and Marco Schilk.

Format:
Book
Contributor:
Kaunisto, Mark, editor.
Schilk, Marco, editor.
Series:
Studies in corpus linguistics ; Volume 118.
Studies in Corpus Linguistics Series ; Volume 118
Language:
English
Subjects (All):
English language--Grammar, Comparative.
English language.
Physical Description:
1 online resource (182 pages)
Edition:
First edition.
Place of Publication:
Amsterdam, Netherlands : John Benjamins Publishing Company, [2024]
Summary:
This book contributes to the work on discussing the challenges faced in different areas of corpus linguistics, namely the compilation, annotation, and analysis of linguistic corpora.
Contents:
Intro
Table of contents
Acknowledgements
From fallacies and pitfalls to solutions and future directions
References
Engaging with bad (meta)data in historical corpus linguistics
1. Introduction
2. POS annotation in diachronic datasets
2.1 Accounting for category change
2.2 Theoretical choices in the design of the annotation scheme
2.3 Annotation tailored to specific research questions
3. Large corpora
3.1 Inaccuracies in text sampling
3.2 Changes in the balance of subgenres
4. Historical databases
4.1 Issues with balance and metadata
4.2 OCR errors
4.2.1 Hapax legomena
4.2.2 Historical lexis
5. Discussion and conclusion
Funding
Named entities as potentially problematic items in corpora
2. Background
2.1 The concepts of proper nouns and proper names
2.2 Annotation of named entities
3. Case studies
3.1 Common nouns used as (parts of) proper nouns
3.2 Near-synonymous adjectives in named entities
4. Discussion and conclusion
Challenges in the compilation, annotation, and analysis of learner corpus data
1. Introduction and general remarks
2. Challenges and how to respond to them
2.1 Multilingual practices and metalinguistic language use
Response
2.2 Task effects
2.3 "Discourse of deficit" and learner corpus annotation
3. Summary and conclusion
Early newspapers as data for corpus linguistics (and Digital Humanities)
2. Digital text analysis in the humanities
2.1 Digital Humanities
2.2 Corpus linguistics
2.3 Towards a useful synergy
3. Historical newspaper prose and the British Library Newspapers database
3.1 Problems with available search tools
3.2 Sampling, balance, and representativeness.
3.3 Registers and subregisters
3.4 Optical Character Recognition (OCR)
4. Discussion
Open Corpus Linguistics - or How to overcome common problems in dealing with corpus data by adopting open research practices
2. Revisiting Rissanen's problems
3. Open Corpus Linguistics
4. Conclusion
Text length and short texts
2.1 Text length, corpora, and social media
2.2 The importance of text length
3. Solutions and workarounds
3.1 Manipulation of the data
3.1.1 Exclusion
3.1.2 Combining
3.1.3 Chunking
3.2 Computational and statistical approaches
3.2.1 Lengthwise analysis
3.2.2 Multiple Correspondence Analysis
3.2.3 Resampling methods
3.3 A related problem
Corpus genre categories
2. Looking up from the pit
3. Text genre categorization in literature
4. Text genre categorization in linguistics
5. The genre category pitfall
6. Conclusion
Modeling fine-grained sociolinguistic variation
2. Theoretical and methodological background
2.1 Semantic shifts in Quebec English
2.2 Twitter-based corpora for language variation
2.3 Vector space models for lexical semantic variation
3. Data and method
3.1 A corpus of tweets
3.2 A set of semantic shifts in Quebec English
3.3 Neural word embeddings
3.4 Clustering and annotating the uses of a lexical item
4. Results
4.1 An overview of regionally specific clusters
4.2 Types of variation captured by the analysis
4.2.1 True positives
A clear-cut distinction
A subtler distinction
4.2.2 False positives
Cultural effects
Proper names
French homographs in codeswitched tweets
Structural patterns affecting model performance.
4.3 Deploying coarsely annotated data for linguistic description
Subject index.
Notes:
Description based on publisher supplied metadata and other sources.
Description based on print version record.
Includes bibliographical references.
ISBN:
9789027246530
902724653X
OCLC:
1455385919

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account