My Account Log in

1 option

Data engineering for beginners / Chisom Nwokwu.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Nwokwu, Chisom, author.
Series:
Tech Today Series
Language:
English
Subjects (All):
Big data.
Database management.
Physical Description:
1 online resource (387 pages)
Edition:
1st ed.
Place of Publication:
Hoboken, New Jersey : John Wiley & Sons, Incorporated, 2026.
Summary:
A hands-on technical and industry roadmap for aspiring data engineers In Data Engineering for Beginners, big data expert Chisom Nwokwu delivers a beginner-friendly handbook for everyone interested in the fundamentals of data engineering.
Contents:
Chapter 1 Understanding Data
A Brief History of Data
Data in 19,000 BCE: The Great Baboon and Abacus
Data in the 1600s: Public Health Statistics
Data in the 1800s: The U.S. Census
Data in the 1900s: The Concept of Storage
Data in the 1990s: Data and the Internet
Types of Data
Structured Data
Unstructured Data
Semi-structured Data
Why Is Data Important?
Healthcare
Supply Chain
Transportation and Logistics
Artificial Intelligence
Data and Information
Summary
Notes
Chapter 2 Introduction to Data Engineering
Data Engineering Explained Using an Oil Refinery Analogy
An Overview of the Data Engineering Life Cycle
Data Storage
Data Ingestion
Data Transformation
Data Serving
Navigating Project Requirements, Engaging Stakeholders, and Delivering Business Value
Requirements Gathering
Understanding Stakeholders
Understanding System Requirements
Delivering Business Value
The Current State of Data Engineering
The Importance of Data Engineering
Chapter 3 Database Fundamentals
Key Concepts of Databases
Rows
Columns
Schema
Keys
Types of Databases
Relational Databases
NoSQL Databases
Choosing Between Relational and NoSQL Databases
Start with Your Data's Structure
Think About the Relationships in Your Data
How Fast Do You Need to Move?
How Do You Need to Query Your Data?
Scaling and Performance
Transaction and Strong Consistency Needs
Chapter 4 SQL Fundamentals
Introduction to SQL
Basic SQL Clauses
Comparison Operators
LIKE Statement
IN Statement.
BETWEEN Statement
AND Statement
OR Statement
NOT Statement
IS NULL and IS NOT NULL Statements
Sorting and Limiting
Aggregate Functions
SUM()
AVG()
MAX() and MIN()
GROUP BY
HAVING
Understanding Joins
INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL OUTER JOIN
Subqueries
Common Table Expressions (CTEs)
Set Operations
Window Functions
Lab: Setting Up SQL Server and Running SQL Queries
Best Practices for Writing Efficient SQL Queries
Chapter 5 Database Design
Data Modeling
Why Do We Need to Model Data?
Types of Data Modeling
Normalization
Rules of Normalization
Downsides of Normalization
Denormalization
Data Modeling Best Practices
Define the Grain
Normalize Now, Denormalize Later
Choose the Right Data Types
Proper Naming Conventions
Database Optimization
Indexing
Partitioning
Sharding
Views
Chapter 6 Data Warehouses, Data Lakes, and Data Lakehouses
Data Warehouses
Extract, Transform, and Load (ETL)
Schema Design
Snowflake Schema
Slowly Changing Dimensions
Data Marts
Benefits of a Data Mart
Challenges with Data Marts
Data Lakes
How Do Data Lakes Work?
Challenges of Data Lakes
Data Lakehouse
Features of a Data Lakehouse
Data Lakehouse Architecture
The Key Differences Between a Database, Data Warehouse, Data Lake, and Data Lakehouse
Chapter 7 Data Pipelines
Batch Pipelines
Components of a Batch Pipeline
ETL Pipelines vs. ELT Pipelines
Stream Pipelines
How Would This Work?
Components of a Streaming Data Pipeline
Lambda Architecture
Components of the Lambda Architecture
Advantages of the Lambda Architecture
Challenges and Trade-offs
Data Orchestration
Directed Acyclic Graphs (DAGs)
Scheduling and Automation
Monitoring
Alerts.
Lab: Building an ETL Pipeline and Automating with Apache Airflow
Requirements
Set Up Your Development Environment
Extracting Data from CSV
Transforming the Data
Load the New CSV File into a Postgres Database Instance
Schedule ETL Pipeline with Apache Airflow
Chapter 8 Data Quality
Bad Data
Dimensions of Data Quality
Accuracy
Completeness
Consistency
Validity
Uniqueness
Timeliness
Accessibility
Relevance
Data Quality Hierarchy
Data Quality Best Practices
Chapter 9 Data Security
What Is Data Security?
Common Threats to Data Security
Core Principles of Data Security
Confidentiality
Integrity
Availability
Data Encryption
Symmetric Encryption
Asymmetric Encryption
Data Masking
Understanding Network Security
Access Control
Authentication
Authorization
The Principle of Least Privilege
Access Levels
Secrets Management
Data Security and Data Privacy
Chapter 10 Data Governance
How to Think About Data Governance
Data Governance Framework
Policies
Regulatory Compliance Policy
Data Classification Policy
Data Retention and Disposal Policy
Data Sharing Policy
Processes
Metadata Management
Data Lineage
Incident Management
Master Data Management
Roles in the Data Governance Framework
Data Owner
Data Steward
Data Custodian
Chief Data Officer (CDO)
Data Management and Data Governance
Chapter 11 Big Data and Distributed Systems
The Five V's of Big Data
Volume
Velocity
Variety
Veracity
Value
Distributed Systems
Scalability
Fault Tolerance
Reliability
Concurrency
Resource Management
Load Balancing
Latency
Distributed Data Processing
Apache Hadoop
Big Data File Types
Avro.
Parquet
Optimized Row Columnar (ORC)
Choosing the File Type
Chapter 12 Data Engineering on the Cloud
Cloud Computing
On-Premises
Cloud
Making the Right Choice
Core Cloud Concepts
Storage
Compute
Networking
Cloud Service Models
Infrastructure as a Service
Platform as a Service
Software as a Service
Choosing Between IaaS, PaaS, and SaaS
A Hybrid Approach
Cloud Management Models
Serverless
Managed
Self-Managed
Putting It All Together
Cost Optimization
Understanding Cloud Pricing Models
Rightsizing Resources
Smart Job Scheduling
Storage Optimization
Shutting Down Idle Resources
Use Serverless Where Possible
Monitoring and Alerting
Chapter 13 Building a Career in Data Engineering
Types of Data Engineering Roles
Types of Data Engineers
Platform Data Engineer
Analytics Data Engineer
AI/ML Data Engineers
Landing Your First Data Engineering Role
A Typical Data Engineering Job Description
How to Build a Winning Résumé
Preparing for a Data Engineering Interview
Thinking Like a Data Engineer
Think in Systems
Learn to Prioritize Data Quality
Design for Failure
Balance Business Context with Technical Choices
Optimize for Clarity, Then Speed
Think Beyond the Tool
Master Automation
Appendix: Sample Interview Questions
SQL
Data Pipelines
Apache Spark
System Design
Data Engineering Glossary.
Notes:
Includes index.
Description based on publisher supplied metadata and other sources.
ISBN:
1-394-32542-8
1-394-35257-3
9781394325429
OCLC:
1546814862

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account