My Account Log in

2 options

Synthetic Data for Machine Learning : Revolutionize Your Approach to Machine Learning with This Comprehensive Conceptual Guide.

EBSCOhost Academic eBook Collection (North America) Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Kerim, Abdulrahman.
Language:
English
Subjects (All):
Machine learning.
Computer vision.
Physical Description:
1 online resource (209 pages)
Edition:
1st ed.
Place of Publication:
Birmingham : Packt Publishing, Limited, 2023.
Summary:
Conquer data hurdles, supercharge your ML journey, and become a leader in your field with synthetic data generation techniques, best practices, and case studies Key Features Avoid common data issues by identifying and solving them using synthetic data-based solutions Master synthetic data generation approaches to prepare for the future of machine learning Enhance performance, reduce budget, and stand out from competitors using synthetic data Purchase of the print or Kindle book includes a free PDF eBook Book Description The machine learning (ML) revolution has made our world unimaginable without its products and services. However, training ML models requires vast datasets, which entails a process plagued by high costs, errors, and privacy concerns associated with collecting and annotating real data. Synthetic data emerges as a promising solution to all these challenges. This book is designed to bridge theory and practice of using synthetic data, offering invaluable support for your ML journey. Synthetic Data for Machine Learning empowers you to tackle real data issues, enhance your ML models' performance, and gain a deep understanding of synthetic data generation. You'll explore the strengths and weaknesses of various approaches, gaining practical knowledge with hands-on examples of modern methods, including Generative Adversarial Networks (GANs) and diffusion models. Additionally, you'll uncover the secrets and best practices to harness the full potential of synthetic data. By the end of this book, you'll have mastered synthetic data and positioned yourself as a market leader, ready for more advanced, cost-effective, and higher-quality data sources, setting you ahead of your peers in the next generation of ML. What you will learn Understand real data problems, limitations, drawbacks, and pitfalls Harness the potential of synthetic data for data-hungry ML models Discover state-of-the-art synthetic data generation approaches and solutions Uncover synthetic data potential by working on diverse case studies Understand synthetic data challenges and emerging research topics Apply synthetic data to your ML projects successfully Who this book is for If you are a machine learning (ML) practitioner or researcher who wants to overcome data problems, this book is for you. Basic knowledge of ML and Python programming is required. The book is one of the pioneer works on the subject, providing leading-edge support for ML engineers, researchers, companies, and decision makers.
Contents:
Cover
Title Page
Copyright and Credits
Dedications
Contributors
Table of Contents
Part 1: Real Data Issues, Limitations, and Challenges
Chapter 1: Machine Learning and the Need for Data
Technical requirements
Artificial intelligence, machine learning, and deep learning
Artificial intelligence (AI)
Machine learning (ML)
Deep learning (DL)
Why are ML and DL so powerful?
Feature engineering
Transfer across tasks
Training ML models
Collecting and annotating data
Designing and training an ML model
Validating and testing an ML model
Iterations in the ML development process
Summary
Chapter 2: Annotating Real Data
Annotating data for ML
Learning from data
Training your ML model
Testing your ML model
Issues with the annotation process
The annotation process is expensive
The annotation process is error-prone
The annotation process is biased
Optical flow and depth estimation
Ground truth generation for computer vision
Optical flow estimation
Depth estimation
Chapter 3: Privacy Issues in Real Data
Why is privacy an issue in ML?
ML task
Dataset size
Regulations
What exactly is the privacy problem in ML?
Copyright and intellectual property infringement
Privacy and reproducibility of experiments
Privacy issues and bias
Privacy-preserving ML
Approaches for privacy-preserving datasets
Approaches for privacy-preserving ML
Real data challenges and issues
Part 2: An Overview of Synthetic Data for Machine Learning
Chapter 4: An Introduction to Synthetic Data
What is synthetic data?
Synthetic and real data
Data-centric and architecture-centric approaches in ML
History of synthetic data
Random number generators
Generative Adversarial Networks (GANs).
Synthetic data for privacy issues
Synthetic data in computer vision
Synthetic data and ethical considerations
Synthetic data types
Data augmentation
Geometric transformations
Noise injection
Text replacement, deletion, and injection
Chapter 5: Synthetic Data as a Solution
The main advantages of synthetic data
Unbiased
Diverse
Controllable
Scalable
Automatic data labeling
Annotation quality
Low cost
Solving privacy issues with synthetic data
Using synthetic data to solve time and efficiency issues
Synthetic data as a revolutionary solution for rare data
Synthetic data generation methods
Part 3: Synthetic Data Generation Approaches
Chapter 6: Leveraging Simulators and Rendering Engines to Generate Synthetic Data
Introduction to simulators and rendering engines
Simulators
Rendering and game engines
History and evolution of simulators and game engines
Generating synthetic data
Identify the task and ground truth to generate
Create the 3D virtual world in the game engine
Setting up the virtual camera
Adding noise and anomalies
Setting up the labeling pipeline
Generating the training data with the ground truth
Challenges and limitations
Realism
Diversity
Complexity
Looking at two case studies
AirSim
CARLA
Chapter 7: Exploring Generative Adversarial Networks
What is a GAN?
Training a GAN
GAN training algorithm
Training loss
Challenges
Utilizing GANs to generate synthetic data
Hands-on GANs in practice
Variations of GANs
Conditional GAN (cGAN)
CycleGAN
Conditional Tabular GAN (CTGAN)
Wasserstein GAN (WGAN) and Wasserstein GAN with Gradient Penalty (WGAN-GP)
f-GAN
DragGAN
Chapter 8: Video Games as a Source of Synthetic Data.
The impact of the video game industry
Photorealism and the real-synthetic domain shift
Time, effort, and cost
Generating synthetic data using video games
Utilizing games for general data collection
Utilizing games for social studies
Utilizing simulation games for data generation
Controllability
Game genres and limitations on synthetic data generation
Ethical issues
Intellectual property
Chapter 9: Exploring Diffusion Models for Synthetic Data
An introduction to diffusion models
The training process of DMs
Applications of DMs
Diffusion models - the pros and cons
The pros of using DMs
The cons of using DMS
Hands-on diffusion models in practice
Context
Dataset
ML model
Training
Testing
Diffusion models - ethical issues
Copyright
Bias
Inappropriate content
Responsibility
Privacy
Fraud and identity theft
Part 4: Case Studies and Best Practices
Chapter 10: Case Study 1 - Computer Vision
Transforming industries - the power of computer vision
The four waves of the industrial revolution
Industry 4.0 and computer vision
Synthetic data and computer vision - examples from industry
Neurolabs using synthetic data in retail
Microsoft using synthetic data alone for face analysis
Synthesis AI using synthetic data for virtual try-on
Chapter 11: Case Study 2 - Natural Language Processing
A brief introduction to NLP
Applications of NLP in practice
The need for large-scale training datasets in NLP
Human language complexity
Contextual dependence
Generalization
Hands-on practical example with ChatGPT
Synthetic data as a solution for NLP problems
SYSTRAN Soft's use of synthetic data
Telefónica's use of synthetic data.
Clinical text mining utilizing synthetic data
The Alexa virtual assistant model
Chapter 12: Case Study 3 - Predictive Analytics
What is predictive analytics?
Applications of predictive analytics
Predictive analytics issues with real data
Partial and scarce training data
Cost
Case studies of utilizing synthetic data for predictive analytics
Provinzial and synthetic data
Healthcare benefits from synthetic data in predictive analytics
Amazon fraud transaction prediction using synthetic data
Chapter 13: Best Practices for Applying Synthetic Data
Unveiling the challenges of generating and utilizing synthetic data
Domain gap
Data representation
Privacy, security, and validation
Trust and credibility
Domain-specific issues limiting the usability of synthetic data
Healthcare
Finance
Autonomous cars
Best practices for the effective utilization of synthetic data
Part 5: Current Challenges and Future Perspectives
Chapter 14: Synthetic-to-Real Domain Adaptation
The domain gap problem in ML
Sensitivity to sensors' variations
Discrepancy in class and feature distributions
Concept drift
Approaches for synthetic-to-real domain adaptation
Domain randomization
Adversarial domain adaptation
Feature-based domain adaptation
Synthetic-to-real domain adaptation - issues and challenges
Unseen domain
Limited real data
Computational complexity
Synthetic data limitations
Multimodal data complexity
Chapter 15: Diversity Issues in Synthetic Data
The need for diverse data in ML
Transferability
Better problem modeling
Security
Process of debugging
Robustness to anomalies
Creativity
Inclusivity
Generating diverse synthetic datasets
Latent space variations.
Ensemble synthetic data generation
Diversity regularization
Incorporating external knowledge
Progressive training
Procedural content generation with game engines
Diversity issues in the synthetic data realm
Balancing diversity and realism
Privacy and confidentiality concerns
Validation and evaluation challenges
Chapter 16: Photorealism in Computer Vision
Synthetic data photorealism for computer vision
Feature extraction
Robustness
Benchmarking performance
Photorealism approaches
Physically Based Rendering (PBR)
Neural style transfer
Photorealism evaluation metrics
Structural Similarity Index Measure (SSIM)
Learned Perceptual Image Patch Similarity (LPIPS)
Expert evaluation
Challenges and limitations of photorealistic synthetic data
Creating hyper-realistic scenes
Resources versus photorealism trade-off
Chapter 17: Conclusion
Real data and its problems
Synthetic data as a solution
Real-world case studies
Future perspectives
Index
Other Books You May Enjoy.
Notes:
Description based on publisher supplied metadata and other sources.
ISBN:
9781803232607
1803232609
OCLC:
1406406795

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account