My Account Log in

1 option

Data observability for data engineering : proactive strategies for ensuring data accuracy and addressing broken data pipelines / Michele Pinto, Sammy El Khammal.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Pinto, Michele, author.
El Khammal, Sammy, author.
Language:
English
Subjects (All):
Data mining.
Database management.
Digital libraries.
Semantic Web.
Physical Description:
1 online resource
Edition:
1st edition.
Place of Publication:
Birmingham : Packt Publishing, 2023.
Summary:
Discover actionable steps to maintain healthy data pipelines to promote data observability within your teams with this essential guide to elevating data engineering practices Key Features Learn how to monitor your data pipelines in a scalable way Apply real-life use cases and projects to gain hands-on experience in implementing data observability Instil trust in your pipelines among data producers and consumers alike Purchase of the print or Kindle book includes a free PDF eBook Book Description In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You'll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you'll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you'll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again. What you will learn Implement a data observability approach to enhance the quality of data pipelines Collect and analyze key metrics through coding examples Apply monkey patching in a Python module Manage the costs and risks associated with your data pipeline Understand the main techniques for collecting observability metrics Implement monitoring techniques for analytics pipelines in production Build and maintain a statistics engine continuously Who this book is for This book is for data engineers, data architects, data analysts, and data scientists who have encountered issues with broken data pipelines or dashboards. Organizations seeking to adopt data observability practices and managers responsible for data quality and processes will find this book especially useful to increase the confidence of data consumers and raise awareness among producers regarding their data pipelines.
Contents:
Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Part 1: Introduction to Data Observability
Chapter 1: Fundamentals of Data Quality Monitoring
Learning about the maturity path of data in companies
Identifying information bias in data
Data producers
Data consumers
The relationship between producers and consumers
Asymmetric information among stakeholders
Exploring the seven dimensions of data quality
Accuracy
Completeness
Consistency
Conformity
Integrity
Timeliness
Uniqueness
Consequences of data quality issues
Turning data quality into SLAs
An agreement as a starting point
The incumbent responsibilities of producers
Considerations for SLOs and SLAs
Indicators of data quality
Data source metadata
Schema
Lineage
Application
Statistics and KPIs
Examples of SLAs, SLOs, and SLIs
Alerting on data quality issues
Using indicators to create rules
The data scorecard
Summary
Chapter 2: Fundamentals of Data Observability
Technical requirements
From data quality monitoring to data observability
Three principles of data observability
Data observability in IT observability
Key components of data observability
The contract between the application owner and the marketing team
Observing a timeliness issue
Observing a completeness issue
Observing a change in data distribution
Data observability in the enterprise ecosystem
Measuring the return on investment
defining the goals
Part 2: Implementing Data Observability
Chapter 3: Data Observability Techniques
Analyzing the data
Monitoring data asynchronously
Monitoring data synchronously
Analyzing the application
The anatomy of an external analyzer
Pros and cons of the application analyzer method
Advantages
Disadvantages
Principles of monkey patching for data observability
Wrapping the function
Consolidating the findings
Pros and cons of the monkey patching method
Advanced techniques for data observability
distributed tracing
Chapter 4: Data Observability Elements
Prerequisites and installation requirements
Kensu
a data observability framework
kensu-py
an overview of the monkey patching technique
Static and dynamic elements
Defining the data observability context
Application or process
Code base
Code version
Project
Environment
User
Timestamp
The application run
Getting the metadata of the data sources
Data source
Mastering lineage
Types of lineage and dependencies
Lineage run
What's in the log?
Computing observability metrics
Data observability for AI models
Model method
Model training
Model metrics
The feedback loop in data observability
Summary
Notes:
Includes index.
OCLC-licensed vendor bibliographic record.
ISBN:
9781804612095
180461209X
OCLC:
1416602460

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account