3 options
Python 3 text processing with NLTK 3 cookbook : over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 / Jacob Perkins ; cover image by Faiz Fattohi.
- Format:
- Book
- Author/Creator:
- Perkins, Jacob, author.
- Language:
- English
- Subjects (All):
- Python (Computer program language).
- Natural language processing (Computer science)--Research.
- Natural language processing (Computer science).
- Physical Description:
- 1 online resource (304 p.)
- Edition:
- Second edition.
- Place of Publication:
- Birmingham, England : Packt Publishing Ltd, 2014.
- Language Note:
- English
- System Details:
- text file
- Summary:
- Over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 In Detail This book will show you the essential techniques of text and language processing. Starting with tokenization, stemming, and the WordNet dictionary, you'll progress to part-of-speech tagging, phrase chunking, and named entity recognition. You'll learn how various text corpora are organized, as well as how to create your own custom corpus. Then, you'll move onto text classification with a focus on sentiment analysis. And because NLP can be computationally expensive on large bodies of text, you'll try a few methods for distributed text processing. Finally, you'll be introduced to a number of other small but complementary Python libraries for text analysis, cleaning, and parsing. This cookbook provides simple, straightforward examples so you can quickly learn text processing with Python and NLTK. What You Will Learn Tokenize text into sentences, and sentences into words Look up words in the WordNet dictionary Apply spelling correction and word replacement Access the built-in text corpora and create your own custom corpus Tag words with parts of speech Chunk phrases and recognize named entities Grammatically transform phrases and chunks Classify text and perform sentiment analysis
- Contents:
- Intro
- Python 3 Text Processing with NLTK 3 Cookbook
- Table of Contents
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Support files, eBooks, discount offers, and more
- Why Subscribe?
- Free Access for Packt account holders
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Errata
- Piracy
- Questions
- 1. Tokenizing Text and WordNet Basics
- Introduction
- Tokenizing text into sentences
- Getting ready
- How to do it...
- How it works...
- There's more...
- Tokenizing sentences in other languages
- See also
- Tokenizing sentences into words
- Separating contractions
- PunktWordTokenizer
- WordPunctTokenizer
- Tokenizing sentences using regular expressions
- Simple whitespace tokenizer
- Training a sentence tokenizer
- Filtering stopwords in a tokenized sentence
- Looking up Synsets for a word in WordNet
- Working with hypernyms
- Part of speech (POS)
- Looking up lemmas and synonyms in WordNet
- All possible synonyms
- Antonyms
- Calculating WordNet Synset similarity
- Comparing verbs
- Path and Leacock Chordorow (LCH) similarity
- Discovering word collocations.
- Getting ready
- Scoring functions
- Scoring ngrams
- 2. Replacing and Correcting Words
- Stemming words
- The LancasterStemmer class
- The RegexpStemmer class
- The SnowballStemmer class
- Lemmatizing words with WordNet
- Combining stemming with lemmatization
- Replacing words matching regular expressions
- Replacement before tokenization
- Removing repeating characters
- Spelling correction with Enchant
- The en_GB dictionary
- Personal word lists
- Replacing synonyms
- CSV synonym replacement
- YAML synonym replacement
- Replacing negations with antonyms
- 3. Creating Custom Corpora
- Setting up a custom corpus
- Loading a YAML file
- Creating a wordlist corpus
- Names wordlist corpus
- English words corpus
- Creating a part-of-speech tagged word corpus
- Customizing the word tokenizer
- Customizing the sentence tokenizer
- Customizing the paragraph block reader
- Customizing the tag separator.
- Converting tags to a universal tagset
- Creating a chunked phrase corpus
- Tree leaves
- Treebank chunk corpus
- CoNLL2000 corpus
- Creating a categorized text corpus
- Category file
- Categorized tagged corpus reader
- Categorized corpora
- Creating a categorized chunk corpus reader
- Categorized CoNLL chunk corpus reader
- Lazy corpus loading
- Creating a custom corpus view
- Block reader functions
- Pickle corpus view
- Concatenated corpus view
- Creating a MongoDB-backed corpus reader
- Corpus editing with file locking
- 4. Part-of-speech Tagging
- Default tagging
- Evaluating accuracy
- Tagging sentences
- Untagging a tagged sentence
- Training a unigram part-of-speech tagger
- Overriding the context model
- Minimum frequency cutoff
- Combining taggers with backoff tagging
- Saving and loading a trained tagger with pickle
- Training and combining ngram taggers
- Quadgram tagger
- Creating a model of likely word tags
- How it works.
- There's more...
- Tagging with regular expressions
- Affix tagging
- Working with min_stem_length
- Training a Brill tagger
- Tracing
- Training the TnT tagger
- Controlling the beam search
- Significance of capitalization
- Using WordNet for tagging
- Tagging proper names
- Classifier-based tagging
- Detecting features with a custom feature detector
- Setting a cutoff probability
- Using a pre-trained classifier
- Training a tagger with NLTK-Trainer
- Saving a pickled tagger
- Training on a custom corpus
- Training with universal tags
- Analyzing a tagger against a tagged corpus
- Analyzing a tagged corpus
- 5. Extracting Chunks
- Chunking and chinking with regular expressions
- Parsing different chunk types
- Parsing alternative patterns
- Chunk rule with context
- Merging and splitting chunks with regular expressions
- Specifying rule descriptions
- Expanding and removing chunks with regular expressions
- Partial parsing with regular expressions
- The ChunkScore metrics.
- Looping and tracing chunk rules
- Training a tagger-based chunker
- Using different taggers
- Classification-based chunking
- Using a different classifier builder
- Extracting named entities
- Binary named entity extraction
- Extracting proper noun chunks
- Extracting location chunks
- Training a named entity chunker
- Training a chunker with NLTK-Trainer
- Saving a pickled chunker
- Training on parse trees
- Analyzing a chunker against a chunked corpus
- Analyzing a chunked corpus
- 6. Transforming Chunks and Trees
- Filtering insignificant words from a sentence
- Correcting verb forms
- Swapping verb phrases
- Swapping noun cardinals
- Swapping infinitive phrases
- Singularizing plural nouns
- Chaining chunk transformations
- Converting a chunk tree to text
- There's more.
- Notes:
- "Quick answers to common problems"--Cover.
- Includes index.
- Description based on online resource; title from PDF title page (ebrary, viewed September 2, 2014).
- ISBN:
- 9781782167860
- 1782167862
- OCLC:
- 891381366
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.