1 option
Applied Natural Language Processing in the Enterprise / Patel, Ankur.
- Format:
- Book
- Author/Creator:
- Patel, Ankur, author.
- Arasanipalai, Ajay Uppili, author.
- Language:
- English
- Subjects (All):
- Natural language processing (Computer science).
- Machine learning.
- Physical Description:
- 1 online resource (350 pages)
- Edition:
- 1st edition
- Place of Publication:
- O'Reilly Media, Inc., 2021.
- System Details:
- text file
- Summary:
- NLP is one of the hottest topics in AI today. Having lagged for years behind other deep learning fields such as computer vision, NLP only recently gained mainstream popularity. Google, Facebook, and OpenAI have open-sourced large pretrained language models, but many organizations today still struggle with building and adopting NLP applications. This hands-on guide helps you learn the process quickly. If you have a basic to intermediate understanding of machine learning and programming experience with Python, you’ll learn how to build and deploy real-world NLP applications in your organization. Authors Ankur Patel and Ajay Uppili Arasanipalai walk you through the process without bogging you down in theory. Understand how state-of-the-art NLP models work Learn the tools of the trade, including frameworks popular today Perform NLP tasks such as text classification, semantic search, and reading comprehension Solve problems using new models like transformers and techniques such as transfer learning Build NLP models from scratch with performance comparable or superior to out-of-the-box systems Deploy your models to production and maintain their performance Implement a suite of NLP algorithms using Python and PyTorch
- Contents:
- Intro
- Copyright
- Table of Contents
- Preface
- What Is Natural Language Processing?
- Why Should I Read This Book?
- What Do I Need to Know Already?
- What Is This Book All About?
- How Is This Book Organized?
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Ajay
- Ankur
- Part I. Scratching the Surface
- Chapter 1. Introduction to NLP
- What Is NLP?
- Popular Applications
- History
- Inflection Points
- A Final Word
- Basic NLP
- Defining NLP Tasks
- Set Up the Programming Environment
- spaCy, fast.ai, and Hugging Face
- Perform NLP Tasks Using spaCy
- Conclusion
- Chapter 2. Transformers and Transfer Learning
- Training with fastai
- Using the fastai Library
- ULMFiT for Transfer Learning
- Fine-Tuning a Language Model on IMDb
- Training a Text Classifier
- Inference with Hugging Face
- Loading Models
- Generating Predictions
- Chapter 3. NLP Tasks and Applications
- Pretrained Language Models
- Transfer Learning and Fine-Tuning
- NLP Tasks
- Natural Language Dataset
- Explore the AG Dataset
- NLP Task #1: Named Entity Recognition
- Perform Inference Using the Original spaCy Model
- Custom NER
- Annotate via Prodigy: NER
- Train the Custom NER Model Using spaCy
- Custom NER Model Versus Original NER Model
- NLP Task #2: Text Classification
- Annotate via Prodigy: Text Classification
- Train Text Classification Models Using spaCy
- Part II. The Cogs in the Machine
- Chapter 4. Tokenization
- A Minimal Tokenizer
- Hugging Face Tokenizers
- Subword Tokenization
- Building Your Own Tokenizer
- Chapter 5. Embeddings: How Machines "Understand" Words
- Understanding Versus Reading Text
- Word Vectors
- Word2Vec
- Embeddings in the Age of Transfer Learning.
- Embeddings in Practice
- Preprocessing
- Model
- Training
- Validation
- Embedding Things That Aren't Words
- Making Vectorized Music
- Some General Tips for Making Custom Embeddings
- Chapter 6. Recurrent Neural Networks and Other Sequence Models
- Recurrent Neural Networks
- RNNs in PyTorch from Scratch
- Bidirectional RNN
- Sequence to Sequence Using RNNs
- Long Short-Term Memory
- Gated Recurrent Units
- Chapter 7. Transformers
- Building a Transformer from Scratch
- Attention Mechanisms
- Dot Product Attention
- Scaled Dot Product Attention
- Multi-Head Self-Attention
- Adaptive Attention Span
- Persistent Memory/All-Attention
- Product-Key Memory
- Transformers for Computer Vision
- Chapter 8. BERTology: Putting It All Together
- ImageNet
- The Power of Pretrained Models
- The Path to NLP's ImageNet Moment
- Pretrained Word Embeddings
- The Limitations of One-Hot Encoding
- GloVe
- fastText
- Context-Aware Pretrained Word Embeddings
- Sequential Models
- Sequential Data and the Importance of Sequential Models
- RNNs
- Vanilla RNNs
- LSTM Networks
- GRUs
- Transformers
- Transformer-XL
- NLP's ImageNet Moment
- Universal Language Model Fine-Tuning
- ELMo
- BERT
- BERTology
- GPT-1, GPT-2, GPT-3
- Part III. Outside the Wall
- Chapter 9. Tools of the Trade
- Deep Learning Frameworks
- PyTorch
- TensorFlow
- Jax
- Julia
- Visualization and Experiment Tracking
- TensorBoard
- Weights &
- Biases
- Neptune
- Comet
- MLflow
- AutoML
- H2O.ai
- Dataiku
- DataRobot
- ML Infrastructure and Compute
- Paperspace
- FloydHub
- Google Colab
- Kaggle Kernels
- Lambda GPU Cloud
- Edge/On-Device Inference
- ONNX
- Core ML
- Edge Accelerators
- Cloud Inference and Machine Learning as a Service.
- AWS
- Microsoft Azure
- Google Cloud Platform
- Continuous Integration and Delivery
- Chapter 10. Visualization
- Our First Streamlit App
- Build the Streamlit App
- Deploy the Streamlit App
- Explore the Streamlit Web App
- Build and Deploy a Streamlit App for Custom NER
- Build and Deploy a Streamlit App for Text Classification on AG News Dataset
- Build and Deploy a Streamlit App for Text Classification on Custom Text
- Chapter 11. Productionization
- Data Scientists, Engineers, and Analysts
- Prototyping, Deployment, and Maintenance
- Notebooks and Scripts
- Databricks: Your Unified Data Analytics Platform
- Support for Big Data
- Support for Multiple Programming Languages
- Support for ML Frameworks
- Support for Model Repository, Access Control, Data Lineage, and Versioning
- Databricks Setup
- Set Up Access to S3 Bucket
- Set Up Libraries
- Create Cluster
- Create Notebook
- Enable Init Script and Restart Cluster
- Run Speed Test: Inference on NER Using spaCy
- Machine Learning Jobs
- Production Pipeline Notebook
- Scheduled Machine Learning Jobs
- Event-Driven Machine Learning Pipeline
- Log and Register Model
- MLflow Model Serving
- Alternatives to Databricks
- Amazon SageMaker
- Saturn Cloud
- Chapter 12. Conclusion
- Ten Final Lessons
- Lesson 1: Start with Simple Approaches First
- Lesson 2: Leverage the Community
- Lesson 3: Do Not Create from Scratch, When Possible
- Lesson 4: Intuition and Experience Trounces Theory
- Lesson 5: Fight Decision Fatigue
- Lesson 6: Data Is King
- Lesson 7: Lean on Humans
- Lesson 8: Pair Yourself with Really Great Engineers
- Lesson 9: Ensemble
- Lesson 10: Have Fun
- Final Word
- Appendix A. Scaling
- Multi-GPU Training
- Distributed Training
- What Makes Deep Training Fast?
- Appendix B. CUDA.
- Threads and Thread Blocks
- Writing CUDA Kernels
- CUDA in Practice
- Index
- About the Authors
- Colophon.
- Notes:
- Online resource; Title from title page (viewed April 25, 2021)
- Description based on publisher supplied metadata and other sources.
- ISBN:
- 1-4920-6252-9
- 1-4920-6254-5
- 1-4920-6256-1
- OCLC:
- 1192526432
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.