My Account Log in

1 option

Transformers for Natural Language Processing and Computer Vision : Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3 / Denis Rothman.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Rothman, Denis, author.
Series:
Expert insight.
Expert insight
Language:
English
Subjects (All):
ChatGPT.
Artificial intelligence--Data processing.
Artificial intelligence.
Natural language processing (Computer science).
Cloud computing.
Physical Description:
1 online resource (729 pages)
Edition:
Third edition.
Place of Publication:
Birmingham, England : Packt Publishing Ltd., [2024]
Biography/History:
Rothman Denis: Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Summary:
Transformers for Natural Language Processing and Computer Vision, Third Edition, explores Large Language Model (LLM) architectures, applications, and various platforms (Hugging Face, OpenAI, and Google Vertex AI) used for Natural Language Processing (NLP) and Computer Vision (CV). The book guides you through different transformer architectures to the latest Foundation Models and Generative AI. You’ll pretrain and fine-tune LLMs and work through different use cases, from summarization to implementing question-answering systems with embedding-based search techniques. You will also learn the risks of LLMs, from hallucinations and memorization to privacy, and how to mitigate such risks using moderation models with rule and knowledge bases. You’ll implement Retrieval Augmented Generation (RAG) with LLMs to improve the accuracy of your models and gain greater control over LLM outputs. Dive into generative vision transformers and multimodal model architectures and build applications, such as image and video-to-text classifiers. Go further by combining different models and platforms and learning about AI agent replication. This book provides you with an understanding of transformer architectures, pretraining, fine-tuning, LLM use cases, and best practices.
Contents:
Cover
Copyright
Contributors
Table of Contents
Preface
Chapter 1: What Are Transformers?
How constant time complexity O(1) changed our lives forever
O(1) attention conquers O(n) recurrent methods
Attention layer
Recurrent layer
The magic of the computational time complexity of an attention layer
Computational time complexity with a CPU
Computational time complexity with a GPU
Computational time complexity with a TPU
TPU-LLM
A brief journey from recurrent to attention
A brief history
From one token to an AI revolution
From one token to everything
Foundation Models
From general purpose to specific tasks
The role of AI professionals
The future of AI professionals
What resources should we use?
Decision-making guidelines
The rise of transformer seamless APIs and assistants
Choosing ready-to-use API-driven libraries
Choosing a cloud platform and transformer model
Summary
Questions
References
Further reading
Chapter 2: Getting Started with the Architecture of the Transformer Model
The rise of the Transformer: Attention Is All You Need
The encoder stack
Input embedding
Positional encoding
Sublayer 1: Multi-head attention
Sublayer 2: Feedforward network
The decoder stack
Output embedding and position encoding
The attention layers
The FFN sublayer, the post-LN, and the linear layer
Training and performance
Hugging Face transformer models
Chapter 3: Emergent vs Downstream Tasks: The Unseen Depths of Transformers
The paradigm shift: What is an NLP task?
Inside the head of the attention sublayer of a transformer
Exploring emergence with ChatGPT
Investigating the potential of downstream tasks
Evaluating models with metrics
Accuracy score
F1-score.
MCC
Human evaluation
Benchmark tasks and datasets
Defining the SuperGLUE benchmark tasks
Running downstream tasks
The Corpus of Linguistic Acceptability (CoLA)
Stanford Sentiment TreeBank (SST-2)
Microsoft Research Paraphrase Corpus (MRPC)
Winograd schemas
Chapter 4: Advancements in Translations with Google Trax, Google Translate, and Gemini
Defining machine translation
Human transductions and translations
Machine transductions and translations
Evaluating machine translations
Preprocessing a WMT dataset
Preprocessing the raw data
Finalizing the preprocessing of the datasets
Evaluating machine translations with BLEU
Geometric evaluations
Applying a smoothing technique
Translations with Google Trax
Installing Trax
Creating the Original Transformer model
Initializing the model using pretrained weights
Tokenizing a sentence
Decoding from the Transformer
De-tokenizing and displaying the translation
Translation with Google Translate
Translation with a Google Translate AJAX API Wrapper
Implementing googletrans
Translation with Gemini
Gemini's potential
Chapter 5: Diving into Fine-Tuning through BERT
The architecture of BERT
Preparing the pretraining input environment
Pretraining and fine-tuning a BERT model
Fine-tuning BERT
Defining a goal
Hardware constraints
Installing Hugging Face Transformers
Importing the modules
Specifying CUDA as the device for torch
Loading the CoLA dataset
Creating sentences, label lists, and adding BERT tokens
Activating the BERT tokenizer
Processing the data
Creating attention masks
Splitting the data into training and validation sets.
Converting all the data into torch tensors
Selecting a batch size and creating an iterator
BERT model configuration
Loading the Hugging Face BERT uncased base model
Optimizer grouped parameters
The hyperparameters for the training loop
The training loop
Training evaluation
Predicting and evaluating using the holdout dataset
Exploring the prediction process
Evaluating using the Matthews correlation coefficient
Matthews correlation coefficient evaluation for the whole dataset
Building a Python interface to interact with the model
Saving the model
Creating an interface for the trained model
Interacting with the model
Chapter 6: Pretraining a Transformer from Scratch through RoBERTa
Training a tokenizer and pretraining a transformer
Building KantaiBERT from scratch
Step 1: Loading the dataset
Step 2: Installing Hugging Face transformers
Step 3: Training a tokenizer
Step 4: Saving the files to disk
Step 5: Loading the trained tokenizer files
Step 6: Checking resource constraints: GPU and CUDA
Step 7: Defining the configuration of the model
Step 8: Reloading the tokenizer in transformers
Step 9: Initializing a model from scratch
Exploring the parameters
Step 10: Building the dataset
Step 11: Defining a data collator
Step 12: Initializing the trainer
Step 13: Pretraining the model
Step 14: Saving the final model (+tokenizer + config) to disk
Step 15: Language modeling with FillMaskPipeline
Pretraining a Generative AI customer support model on X data
Step 1: Downloading the dataset
Step 3: Loading and filtering the data
Step 4: Checking Resource Constraints: GPU and CUDA
Step 5: Defining the configuration of the model.
Step 6: Creating and processing the dataset
Step 7: Initializing the trainer
Step 8: Pretraining the model
Step 9: Saving the model
Step 10: User interface to chat with the Generative AI agent
Further pretraining
Limitations
Next steps
Chapter 7: The Generative AI Revolution with ChatGPT
GPTs as GPTs
Improvement
Diffusion
New application sectors
Self-service assistants
Development assistants
Pervasiveness
The architecture of OpenAI GPT transformer models
The rise of billion-parameter transformer models
The increasing size of transformer models
Context size and maximum path length
From fine-tuning to zero-shot models
Stacking decoder layers
GPT models
OpenAI models as assistants
ChatGPT provides source code
GitHub Copilot code assistant
General-purpose prompt examples
Getting started with ChatGPT - GPT-4 as an assistant
1. GPT-4 helps to explain how to write source code
2. GPT-4 creates a function to show the YouTube presentation of GPT-4 by Greg Brockman on March 14, 2023
3. GPT-4 creates an application for WikiArt to display images
4. GPT-4 creates an application to display IMDb reviews
5. GPT-4 creates an application to display a newsfeed
6. GPT-4 creates a k-means clustering (KMC) algorithm
Getting started with the GPT-4 API
Running our first NLP task with GPT-4
Steps 1: Installing OpenAI and Step 2: Entering the API key
Step 3: Running an NLP task with GPT-4
Key hyperparameters
Running multiple NLP tasks
Retrieval Augmented Generation (RAG) with GPT-4
Installation
Document retrieval
Augmented retrieval generation
Chapter 8: Fine-Tuning OpenAI GPT Models
Risk management.
Fine-tuning a GPT model for completion (generative)
1. Preparing the dataset
1.1. Preparing the data in JSON
1.2. Converting the data to JSONL
2. Fine-tuning an original model
3. Running the fine-tuned GPT model
4. Managing fine-tuned jobs and models
Before leaving
Chapter 9: Shattering the Black Box with Interpretable Tools
Transformer visualization with BertViz
Running BertViz
Step 1: Installing BertViz and importing the modules
Step 2: Load the models and retrieve attention
Step 3: Head view
Step 4: Processing and displaying attention heads
Step 5: Model view
Step 6: Displaying the output probabilities of attention heads
Streaming the output of the attention heads
Visualizing word relationships using attention scores with pandas
exBERT
Interpreting Hugging Face transformers with SHAP
Introducing SHAP
Explaining Hugging Face outputs with SHAP
Transformer visualization via dictionary learning
Transformer factors
Introducing LIME
The visualization interface
Other interpretable AI tools
LIT
PCA
Running LIT
OpenAI LLMs explain neurons in transformers
Limitations and human control
Chapter 10: Investigating the Role of Tokenizers in Shaping Transformer Models
Matching datasets and tokenizers
Best practices
Step 1: Preprocessing
Step 2: Quality control
Step 3: Continuous human quality control
Word2Vec tokenization
Case 0: Words in the dataset and the dictionary
Case 1: Words not in the dataset or the dictionary
Case 2: Noisy relationships
Case 3: Words in a text but not in the dictionary
Case 4: Rare words
Case 5: Replacing rare words.
Exploring sentence and WordPiece tokenizers to understand the efficiency of subword tokenizers for transformers.
Notes:
Includes bibliographical references and index.
Description based on publisher supplied metadata and other sources.
Description based on print version record.
ISBN:
9781805123743
1805123742
OCLC:
1424949941

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account