2 options

Synthetic Data for Machine Learning : Revolutionize Your Approach to Machine Learning with This Comprehensive Conceptual Guide.

EBSCOhost Academic eBook Collection (North America) Available online

O'Reilly Online Learning: Academic/Public Library Edition Available online

Format:: Book
Author/Creator:: Kerim, Abdulrahman.
Language:: English
Subjects (All):: Machine learning.; Computer vision.
Physical Description:: 1 online resource (209 pages)
Edition:: 1st ed.
Place of Publication:: Birmingham : Packt Publishing, Limited, 2023.
Summary:: Conquer data hurdles, supercharge your ML journey, and become a leader in your field with synthetic data generation techniques, best practices, and case studies Key Features Avoid common data issues by identifying and solving them using synthetic data-based solutions Master synthetic data generation approaches to prepare for the future of machine learning Enhance performance, reduce budget, and stand out from competitors using synthetic data Purchase of the print or Kindle book includes a free PDF eBook Book Description The machine learning (ML) revolution has made our world unimaginable without its products and services. However, training ML models requires vast datasets, which entails a process plagued by high costs, errors, and privacy concerns associated with collecting and annotating real data. Synthetic data emerges as a promising solution to all these challenges. This book is designed to bridge theory and practice of using synthetic data, offering invaluable support for your ML journey. Synthetic Data for Machine Learning empowers you to tackle real data issues, enhance your ML models' performance, and gain a deep understanding of synthetic data generation. You'll explore the strengths and weaknesses of various approaches, gaining practical knowledge with hands-on examples of modern methods, including Generative Adversarial Networks (GANs) and diffusion models. Additionally, you'll uncover the secrets and best practices to harness the full potential of synthetic data. By the end of this book, you'll have mastered synthetic data and positioned yourself as a market leader, ready for more advanced, cost-effective, and higher-quality data sources, setting you ahead of your peers in the next generation of ML. What you will learn Understand real data problems, limitations, drawbacks, and pitfalls Harness the potential of synthetic data for data-hungry ML models Discover state-of-the-art synthetic data generation approaches and solutions Uncover synthetic data potential by working on diverse case studies Understand synthetic data challenges and emerging research topics Apply synthetic data to your ML projects successfully Who this book is for If you are a machine learning (ML) practitioner or researcher who wants to overcome data problems, this book is for you. Basic knowledge of ML and Python programming is required. The book is one of the pioneer works on the subject, providing leading-edge support for ML engineers, researchers, companies, and decision makers.
Contents:: Cover; Title Page; Copyright and Credits; Dedications; Contributors; Table of Contents; Part 1: Real Data Issues, Limitations, and Challenges; Chapter 1: Machine Learning and the Need for Data; Technical requirements; Artificial intelligence, machine learning, and deep learning; Artificial intelligence (AI); Machine learning (ML); Deep learning (DL); Why are ML and DL so powerful?; Feature engineering; Transfer across tasks; Training ML models; Collecting and annotating data; Designing and training an ML model; Validating and testing an ML model; Iterations in the ML development process; Summary; Chapter 2: Annotating Real Data; Annotating data for ML; Learning from data; Training your ML model; Testing your ML model; Issues with the annotation process; The annotation process is expensive; The annotation process is error-prone; The annotation process is biased; Optical flow and depth estimation; Ground truth generation for computer vision; Optical flow estimation; Depth estimation; Chapter 3: Privacy Issues in Real Data; Why is privacy an issue in ML?; ML task; Dataset size; Regulations; What exactly is the privacy problem in ML?; Copyright and intellectual property infringement; Privacy and reproducibility of experiments; Privacy issues and bias; Privacy-preserving ML; Approaches for privacy-preserving datasets; Approaches for privacy-preserving ML; Real data challenges and issues; Part 2: An Overview of Synthetic Data for Machine Learning; Chapter 4: An Introduction to Synthetic Data; What is synthetic data?; Synthetic and real data; Data-centric and architecture-centric approaches in ML; History of synthetic data; Random number generators; Generative Adversarial Networks (GANs).; Synthetic data for privacy issues; Synthetic data in computer vision; Synthetic data and ethical considerations; Synthetic data types; Data augmentation; Geometric transformations; Noise injection; Text replacement, deletion, and injection; Chapter 5: Synthetic Data as a Solution; The main advantages of synthetic data; Unbiased; Diverse; Controllable; Scalable; Automatic data labeling; Annotation quality; Low cost; Solving privacy issues with synthetic data; Using synthetic data to solve time and efficiency issues; Synthetic data as a revolutionary solution for rare data; Synthetic data generation methods; Part 3: Synthetic Data Generation Approaches; Chapter 6: Leveraging Simulators and Rendering Engines to Generate Synthetic Data; Introduction to simulators and rendering engines; Simulators; Rendering and game engines; History and evolution of simulators and game engines; Generating synthetic data; Identify the task and ground truth to generate; Create the 3D virtual world in the game engine; Setting up the virtual camera; Adding noise and anomalies; Setting up the labeling pipeline; Generating the training data with the ground truth; Challenges and limitations; Realism; Diversity; Complexity; Looking at two case studies; AirSim; CARLA; Chapter 7: Exploring Generative Adversarial Networks; What is a GAN?; Training a GAN; GAN training algorithm; Training loss; Challenges; Utilizing GANs to generate synthetic data; Hands-on GANs in practice; Variations of GANs; Conditional GAN (cGAN); CycleGAN; Conditional Tabular GAN (CTGAN); Wasserstein GAN (WGAN) and Wasserstein GAN with Gradient Penalty (WGAN-GP); f-GAN; DragGAN; Chapter 8: Video Games as a Source of Synthetic Data.; The impact of the video game industry; Photorealism and the real-synthetic domain shift; Time, effort, and cost; Generating synthetic data using video games; Utilizing games for general data collection; Utilizing games for social studies; Utilizing simulation games for data generation; Controllability; Game genres and limitations on synthetic data generation; Ethical issues; Intellectual property; Chapter 9: Exploring Diffusion Models for Synthetic Data; An introduction to diffusion models; The training process of DMs; Applications of DMs; Diffusion models - the pros and cons; The pros of using DMs; The cons of using DMS; Hands-on diffusion models in practice; Context; Dataset; ML model; Training; Testing; Diffusion models - ethical issues; Copyright; Bias; Inappropriate content; Responsibility; Privacy; Fraud and identity theft; Part 4: Case Studies and Best Practices; Chapter 10: Case Study 1 - Computer Vision; Transforming industries - the power of computer vision; The four waves of the industrial revolution; Industry 4.0 and computer vision; Synthetic data and computer vision - examples from industry; Neurolabs using synthetic data in retail; Microsoft using synthetic data alone for face analysis; Synthesis AI using synthetic data for virtual try-on; Chapter 11: Case Study 2 - Natural Language Processing; A brief introduction to NLP; Applications of NLP in practice; The need for large-scale training datasets in NLP; Human language complexity; Contextual dependence; Generalization; Hands-on practical example with ChatGPT; Synthetic data as a solution for NLP problems; SYSTRAN Soft's use of synthetic data; Telefónica's use of synthetic data.; Clinical text mining utilizing synthetic data; The Alexa virtual assistant model; Chapter 12: Case Study 3 - Predictive Analytics; What is predictive analytics?; Applications of predictive analytics; Predictive analytics issues with real data; Partial and scarce training data; Cost; Case studies of utilizing synthetic data for predictive analytics; Provinzial and synthetic data; Healthcare benefits from synthetic data in predictive analytics; Amazon fraud transaction prediction using synthetic data; Chapter 13: Best Practices for Applying Synthetic Data; Unveiling the challenges of generating and utilizing synthetic data; Domain gap; Data representation; Privacy, security, and validation; Trust and credibility; Domain-specific issues limiting the usability of synthetic data; Healthcare; Finance; Autonomous cars; Best practices for the effective utilization of synthetic data; Part 5: Current Challenges and Future Perspectives; Chapter 14: Synthetic-to-Real Domain Adaptation; The domain gap problem in ML; Sensitivity to sensors' variations; Discrepancy in class and feature distributions; Concept drift; Approaches for synthetic-to-real domain adaptation; Domain randomization; Adversarial domain adaptation; Feature-based domain adaptation; Synthetic-to-real domain adaptation - issues and challenges; Unseen domain; Limited real data; Computational complexity; Synthetic data limitations; Multimodal data complexity; Chapter 15: Diversity Issues in Synthetic Data; The need for diverse data in ML; Transferability; Better problem modeling; Security; Process of debugging; Robustness to anomalies; Creativity; Inclusivity; Generating diverse synthetic datasets; Latent space variations.; Ensemble synthetic data generation; Diversity regularization; Incorporating external knowledge; Progressive training; Procedural content generation with game engines; Diversity issues in the synthetic data realm; Balancing diversity and realism; Privacy and confidentiality concerns; Validation and evaluation challenges; Chapter 16: Photorealism in Computer Vision; Synthetic data photorealism for computer vision; Feature extraction; Robustness; Benchmarking performance; Photorealism approaches; Physically Based Rendering (PBR); Neural style transfer; Photorealism evaluation metrics; Structural Similarity Index Measure (SSIM); Learned Perceptual Image Patch Similarity (LPIPS); Expert evaluation; Challenges and limitations of photorealistic synthetic data; Creating hyper-realistic scenes; Resources versus photorealism trade-off; Chapter 17: Conclusion; Real data and its problems; Synthetic data as a solution; Real-world case studies; Future perspectives; Index; Other Books You May Enjoy.
Notes:: Description based on publisher supplied metadata and other sources.
ISBN:: 9781803232607; 1803232609
OCLC:: 1406406795

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

2 options

Synthetic Data for Machine Learning : Revolutionize Your Approach to Machine Learning with This Comprehensive Conceptual Guide.

Find

My Account

Guides