1 option
Using Stable Diffusion with Python : Leverage Python to Control and Automate High-Quality AI Image Generation Using Stable Diffusion / Andrew Zhu (Shudong Zhu) and Matthew Fisher.
- Format:
- Book
- Author/Creator:
- Zhu, Andrew (Shudong Zhu), author.
- Fisher, Matthew, author.
- Language:
- English
- Subjects (All):
- Python (Computer program language).
- Artificial intelligence.
- Physical Description:
- 1 online resource (352 pages)
- Edition:
- First edition.
- Place of Publication:
- Birmingham, England : Packt Publishing, [2024]
- Biography/History:
- Zhu (Shudong Zhu) Andrew: Andrew Zhu is an experienced Microsoft Applied Data Scientist with over 15 years of experience in the tech field. He is a highly regarded writer known for his ability to explain complex concepts in machine learning and AI in an engaging and informative manner. Andrew frequently contributes articles to Toward Data Science and other prominent tech publishers. He has authored the book "Microsoft Workflow Foundation 4. 0 Cookbook, " which has received a 4. 5-star review. Andrew has a strong command of programming languages such as C/C++, Java, C#, and Javascript, with his current focus primarily on Python. With a passion for AI and Automation, Andrew resides in WA, US, with his family, which includes two boys.
- Summary:
- Master AI image generation by leveraging GenAI tools and techniques such as diffusers, LoRA, textual inversion, ControlNet, and prompt design Key Features Master the art of generating stunning AI artwork with the help of expert guidance and ready-to-run Python code Get instant access to emerging extensions and open-source models Leverage the power of community-shared models and LoRA to produce high-quality images that captivate audiences Purchase of the print or Kindle book includes a free PDF eBook Book Description Stable Diffusion is a game-changing AI tool for image generation, enabling you to create stunning artwork with code. However, mastering it requires an understanding of the underlying concepts and techniques. This book guides you through unlocking the full potential of Stable Diffusion with Python. Starting with an introduction to Stable Diffusion, you'll explore the theory behind diffusion models, set up your environment, and generate your first image using diffusers. You'll learn how to optimize performance, leverage custom models, and integrate community-shared resources like LoRAs, textual inversion, and ControlNet to enhance your creations. After covering techniques such as face restoration, image upscaling, and image restoration, you'll focus on unlocking prompt limitations, scheduled prompt parsing, and weighted prompts to create a fully customized and industry-level Stable Diffusion application. This book also delves into real-world applications in medical imaging, remote sensing, and photo enhancement. Finally, you'll gain insights into extracting generation data, ensuring data persistence, and leveraging AI models like BLIP for image description extraction. By the end of this book, you'll be able to use Python to generate and edit images and leverage solutions to build Stable Diffusion apps for your business and users. What you will learn Explore core concepts and applications of Stable Diffusion and set up your environment for success Refine performance, manage VRAM usage, and leverage community-driven resources like LoRAs and textual inversion Harness the power of ControlNet, IP-Adapter, and other methodologies to generate images with unprecedented control and quality Explore developments in Stable Diffusion such as video generation using AnimateDiff Write effective prompts and leverage LLMs to automate the process Discover how to train a Stable Diffusion LoRA from scratch Who this book is for If you're looking to gain control over AI image generation, particularly through the diffusion model, this book is for you. Moreover, data scientists, ML engineers, researchers, and Python application developers seeking to create AI image generation applications based on the Stable Diffusion framework can benefit from the insights provided in the book.
- Contents:
- Intro
- Title Page
- Copyright and Credits
- Dedication
- Foreword
- Contributors
- Table of Contents
- Preface
- Part 1 - A Whirlwind of Stable Diffusion
- Chapter 1: Introducing Stable Diffusion
- Evolution of the Diffusion model
- Before Transformer and Attention
- Transformer transforms machine learning
- CLIP from OpenAI makes a big difference
- Generate images
- DALL-E 2 and Stable Diffusion
- Why Stable Diffusion
- Which Stable Diffusion to use
- Why this book
- References
- Chapter 2: Setting Up the Environment for Stable Diffusion
- Hardware requirements to run Stable Diffusion
- GPU
- System memory
- Storage
- Software requirements
- CUDA installation
- Installing Python for Windows, Linux, and macOS
- Installing PyTorch
- Running a Stable Diffusion pipeline
- Using Google Colaboratory
- Using Google Colab to run a Stable Diffusion pipeline
- Summary
- Chapter 3: Generating Images Using Stable Diffusion
- Logging in to Hugging Face
- Generating an image
- Generation seed
- Sampling scheduler
- Changing a model
- Guidance scale
- Chapter 4: Understanding the Theory Behind Diffusion Models
- Understanding the image-to-noise process
- A more efficient forward diffusion process
- The noise-to-image training process
- The noise-to-image sampling process
- Understanding Classifier Guidance denoising
- Chapter 5: Understanding How Stable Diffusion Works
- Stable Diffusion in latent space
- Generating latent vectors using diffusers
- Generating text embeddings using CLIP
- Initializing time step embeddings
- Initializing the Stable Diffusion UNet
- Implementing a text-to-image Stable Diffusion inference pipeline
- Implementing a text-guided image-to-image Stable Diffusion inference pipeline
- References.
- Additional reading
- Chapter 6: Using Stable Diffusion Models
- Technical requirements
- Loading the Diffusers model
- Loading model checkpoints from safetensors and ckpt files
- Using ckpt and safetensors files with Diffusers
- Turning off the model safety checker
- Converting the checkpoint model file to the Diffusers format
- Using Stable Diffusion XL
- Part 2 - Improving Diffusers with Custom Features
- Chapter 7: Optimizing Performance and VRAM Usage
- Setting the baseline
- Optimization solution 1 - using the float16 or bfloat16 data type
- Optimization solution 2 - enabling VAE tiling
- Optimization solution 3 - enabling Xformers or using PyTorch 2.0
- Optimization solution 4 - enabling sequential CPU offload
- Optimization solution 5 - enabling model CPU offload
- Optimization solution 6 - Token Merging (ToMe)
- Chapter 8: Using Community-Shared LoRAs
- How does LoRA work?
- Using LoRA with Diffusers
- Applying a LoRA weight during loading
- Diving into the internal structure of LoRA
- Finding the A and B weight matrix from the LoRA file
- Finding the corresponding checkpoint model layer name
- Updating the checkpoint model weights
- Making a function to load LoRA
- Why LoRA works
- Chapter 9: Using Textual Inversion
- Diffusers inference using TI
- How TI works
- Building a custom TI loader
- TI in the pt file format
- TI in bin file format
- Detailed steps to build a TI loader
- Putting all of the code together
- Chapter 10: Overcoming 77-Token Limitations and Enabling Prompt Weighting
- Understanding the 77-token limitation
- Overcoming the 77-tokens limitation
- Putting all the code together into a function
- Enabling long prompts with weighting
- Verifying the work.
- Overcoming the 77-token limitation using community pipelines
- Chapter 11: Image Restore and Super-Resolution
- Understanding the terminologies
- Upscaling images using Img2img diffusion
- One-step super-resolution
- Multiple-step super-resolution
- A super-resolution result comparison
- Img-to-Img limitations
- ControlNet Tile image upscaling
- Steps to use ControlNet Tile to upscale an image
- The ControlNet Tile upscaling result
- Additional ControlNet Tile upscaling samples
- Chapter 12: Scheduled Prompt Parsing
- Using the Compel package
- Building a custom scheduled prompt pipeline
- A scheduled prompt parser
- Filling in the missing steps
- A Stable Diffusion pipeline supporting scheduled prompts
- Part 3 - Advanced Topics
- Chapter 13: Generating Images with ControlNet
- What is ControlNet and how is it different?
- Usage of ControlNet
- Using multiple ControlNets in one pipeline
- How ControlNet works
- Further usage
- More ControlNets with SD
- SDXL ControlNets
- Chapter 14: Generating Video Using Stable Diffusion
- The principles of text-to-video generation
- Practical applications of AnimateDiff
- Utilizing Motion LoRA to control animation motion
- Chapter 15: Generating Image Descriptions Using BLIP-2 and LLaVA
- BLIP-2 - Bootstrapping Language-Image Pre-training
- How BLIP-2 works
- Using BLIP-2 to generate descriptions
- LLaVA - Large Language and Vision Assistant
- How LLaVA works
- Installing LLaVA
- Using LLaVA to generate image descriptions
- Chapter 16: Exploring Stable Diffusion XL
- What's new in SDXL?
- The VAE of the SDXL
- The UNet of SDXL.
- Two text encoders in SDXL
- The two-stage design
- Using SDXL
- Use SDXL community models
- Using SDXL image-to-image to enhance an image
- Using SDXL LoRA models
- Using SDXL with an unlimited prompt
- Chapter 17: Building Optimized Prompts for Stable Diffusion
- What makes a good prompt?
- Be clear and specific
- Be descriptive
- Using consistent terminology
- Reference artworks and styles
- Incorporate negative prompts
- Iterate and refine
- Using LLMs to generate better prompts
- Part 4 - Building Stable Diffusion into an Application
- Chapter 18: Applications - Object Editing and Style Transferring
- Editing images using Stable Diffusion
- Replacing image background content
- Removing the image background
- Object and style transferring
- Loading up a Stable Diffusion pipeline with IP-Adapter
- Transferring style
- Chapter 19: Generation Data Persistence
- Exploring and understanding the PNG file structure
- Saving extra text data in a PNG image file
- PNG extra data storage limitation
- Chapter 20: Creating Interactive User Interfaces
- Introducing Gradio
- Getting started with Gradio
- Gradio fundamentals
- Gradio Blocks
- Inputs and outputs
- Building a progress bar
- Building a Stable Diffusion text-to-image pipeline with Gradio
- Chapter 21: Diffusion Model Transfer Learning
- Training a neural network model with PyTorch
- Preparing the training data
- Preparing for training
- Training a model
- Training a model with Hugging Face's Accelerate
- Applying Hugging Face's Accelerate
- Putting code together
- Training a model with multiple GPUs using Accelerate
- Training a Stable Diffusion V1.5 LoRA
- Defining training hyperparameters.
- Preparing the Stable Diffusion components
- Loading the training data
- Defining the training components
- Kicking off the training
- Verifying the result
- Chapter 22: Exploring Beyond Stable Diffusion
- What sets this AI wave apart
- The enduring value of mathematics and programming
- Staying current with AI innovations
- Cultivating responsible, ethical, private, and secure AI
- Our evolving relationship with AI
- Index
- Other Books You May Enjoy.
- Notes:
- Includes bibliographical references and index.
- Description based on publisher supplied metadata and other sources.
- Description based on print version record.
- ISBN:
- 9781835084311
- 1835084311
- OCLC:
- 1435803070
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.