1 option
Statistical universals of language : mathematical chance vs. human choice / Kumiko Tanaka-Ishii.
Springer Nature - Springer Mathematics and Statistics eBooks 2021 English International Available online
View online- Format:
- Book
- Author/Creator:
- Tanaka-Ishii, Kumiko, author.
- Series:
- Mathematics in Mind
- Language:
- English
- Subjects (All):
- Mathematical linguistics.
- Computational linguistics.
- Physical Description:
- 1 online resource (226 pages) : illustrations
- Edition:
- 1st ed.
- Place of Publication:
- Cham, Switzerland : Springer, [2021]
- Summary:
- This book explores the universal mathematical properties underlying big language data and possible reasons why such properties exist, revealing how we may be unconsciously mathematical in our language use.
- Contents:
- Intro
- Contents
- Part I Language as a Complex System
- 1 Introduction
- 1.1 Aims
- 1.2 Structure of This Book
- 1.3 Position of This Book
- 1.3.1 Statistical Universals as Computational Properties of Natural Language
- 1.3.2 A Holistic Approach to Language via Complex Systems Theory
- 1.4 Prospectus
- 2 Universals
- 2.1 Language Universals
- 2.2 Layers of Universals
- 2.3 Universal, Stylized Hypothesis, and Law
- 3 Language as a Complex System
- 3.1 Sequence and Corpus
- 3.1.1 Definition of Corpus
- 3.1.2 On Meaning
- 3.1.3 On Infinity
- 3.1.4 On Randomness
- 3.2 Power Functions
- 3.3 Scale-Free Property: Statistical Self-Similarity
- 3.4 Complex Systems
- 3.5 Two Basic Random Processes
- Part II Property of Population
- 4 Relation Between Rank and Frequency
- 4.1 Zipf's Law
- 4.2 Scale-Free Property and Hapax Legomena
- 4.3 Monkey Text
- 4.4 Power Law of n-grams
- 4.5 Relative Rank-Frequency Distribution
- 5 Bias in Rank-Frequency Relation
- 5.1 Literary Texts
- 5.2 Speech, Music, Programs, and More
- 5.3 Deviations from Power Law
- 5.3.1 Scale
- 5.3.2 Speaker Maturity
- 5.3.3 Characters vs. Words
- 5.4 Nature of Deviations
- 6 Related Statistical Universals
- 6.1 Density Function
- 6.2 Vocabulary Growth
- Part III Property of Sequences
- 7 Returns
- 7.1 Word Returns
- 7.2 Distribution of Return Interval Lengths
- 7.3 Exceedance Probability
- 7.4 Bias Underlying Return Intervals
- 7.5 Rare Words as a Set
- 7.6 Behavior of Rare Words
- 8 Long-Range Correlation
- 8.1 Long-Range Correlation Analysis
- 8.2 Mutual Information
- 8.3 Autocorrelation Function
- 8.4 Correlation of Word Intervals
- 8.5 Nonstationarity of Language
- 8.6 Weak Long-Range Correlation
- 9 Fluctuation
- 9.1 Fluctuation Analysis
- 9.2 Taylor Analysis
- 9.3 Differences Between the Two Fluctuation Analyses.
- 9.4 Dimensions of Linguistic Fluctuation
- 9.5 Relations Among Methods
- 10 Complexity
- 10.1 Complexity of Sequence
- 10.2 Entropy Rate
- 10.3 Hilberg's Ansatz
- 10.4 Computing Entropy Rate of Human Language
- 10.5 Reconsidering the Question of Entropy Rate
- Part IV Relation to Linguistic Elements and Structure
- 11 Articulation of Elements
- 11.1 Harris's Hypothesis
- 11.2 Information-Theoretic Reformulation
- 11.3 Accuracy of Articulation by Harris's Scheme
- 12 Word Meaning and Value
- 12.1 Meaning as Use and Distributional Semantics
- 12.2 Weber-Fechner Law
- 12.3 Word Frequency and Familiarity
- 12.4 Vector Representation of Words
- 12.5 Compositionality of Meaning
- 12.6 Statistical Universals and Meaning
- 13 Size and Frequency
- 13.1 Zipf Abbreviation of Words
- 13.2 Compound Length and Frequency
- 14 Grammatical Structure and Long Memory
- 14.1 Simple Grammatical Framework
- 14.2 Phrase Structure Grammar
- 14.3 Long-Range Dependence in Sentences
- 14.4 Grammatical Structure and Long-Range Correlation
- 14.5 Nature of Long Memory Underlying Language
- Part V Mathematical Models
- 15 Theories Behind Zipf's Law
- 15.1 Communication Optimization
- 15.2 A Limit Theorem
- 15.3 Significance of Statistical Universals
- 16 Mathematical Generative Models
- 16.1 Criteria for Statistical Universals
- 16.2 Independent and Identically Distributed Sequences
- 16.3 Simon Model and Variants
- 16.4 Random Walk Models
- 17 Language Models
- 17.1 Language Models and Statistical Universals
- 17.2 Building Language Models
- 17.3 N-Gram Models
- 17.4 Grammatical Models
- 17.5 Neural Models
- 17.6 Future Directions for Generative Models
- Part VI Ending Remarks
- 18 Conclusion
- 19 Acknowledgments
- Part VII Appendix
- 20 Glossary and Notations
- 20.1 Glossary
- 20.2 Mathematical Notation.
- 20.3 Other Conventions
- 21 Mathematical Details
- 21.1 Fitting Functions
- 21.2 Proof that Monkey Typing Follows a Power Law
- 21.3 Relation Between η and ζ
- 21.4 Relation Between η and ξ
- 21.5 Proof That Interval Lengths of I.I.D. Process Follow Exponential Distribution
- 21.6 Proof of α=0.5 and ν=1.0 for I.I.D. Process
- 21.7 Summary of Shannon's Method to Estimate Entropy Rate
- 21.8 Relation of h, Perplexity, and Cross Entropy
- 21.9 Type Counts, Shannon Entropy, and Yule's K, via Generalized Entropy
- 21.10 Upper Bound of Compositional Distance
- 21.11 Rough Summary of Mandelbrot's Communication Optimization Rationale to Deduce a Power Law
- 21.12 Rough Definition of Central Limit Theorem
- 21.13 Definition of Simon Model
- 22 Data
- 22.1 Literary Texts
- 22.2 Large Corpora
- 22.3 Other Kinds of Data Related to Language
- 22.4 Corpora for Scripts
- References
- Index.
- Notes:
- Includes bibliographical references and index.
- Description based on print version record.
- Description based on publisher supplied metadata and other sources.
- ISBN:
- 3-030-59377-0
- OCLC:
- 1245672569
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.