My Account Log in

1 option

Practical Data Engineering with Apache Projects : Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More / by Dunith Danushka.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Danushka, Dunith.
Series:
Professional and Applied Computing Series
Language:
English
Subjects (All):
Spark (Electronic resource : Apache Software Foundation).
Data mining.
Physical Description:
1 online resource (158 pages)
Edition:
1st ed. 2025.
Place of Publication:
Berkeley, CA : Apress : Imprint: Apress, 2025.
Summary:
This book is a comprehensive guide designed to equip you with the practical skills and knowledge necessary to tackle real-world data challenges using open-source solutions. Focusing on real-world data engineering projects, it caters specifically to data engineers at the early stages of their careers, providing a strong foundation in essential open source tools and techniques such as Apache Spark, Flink, Airflow, Kafka, and many more. Each chapter is dedicated to a single project, starting with a clear presentation of the problem it addresses. You will then be guided through a step-by-step process to solve the problem, leveraging widely-used open-source data tools. This hands-on approach ensures that you not only understand the theoretical aspects of data engineering but also gain valuable experience in applying these concepts to real-world scenarios. At the end of each chapter, the book delves into common challenges that may arise during the implementation of the solution, offering practical advice on troubleshooting these issues effectively. Additionally, the book highlights best practices that data engineers should follow to ensure the robustness and efficiency of their solutions. A major focus of the book is using open-source projects and tools to solve problems encountered in data engineering. In summary, this book is an indispensable resource for data engineers looking to build a strong foundation in the field. By offering practical, real-world projects and emphasizing problem-solving and best practices, it will prepare you to tackle the complex data challenges encountered throughout your career. Whether you are an aspiring data engineer or looking to enhance your existing skills, this book provides the knowledge and tools you need to succeed in the ever-evolving world of data engineering. You Will Learn: The foundational concepts of data engineering and practical experience in solving real-world data engineering problems How to proficiently use open-source data tools like Apache Kafka, Flink, Spark, Airflow, and Trino 10 hands-on data engineering projects Troubleshoot common challenges in data engineering projects.
Contents:
Part I: Data Lakehouses, Iceberg, Batch ETL, and Orchestration
Chapter 1: Foundational Data Engineering Concepts
Chapter 2: Building a Data Lakehouse with Apache Iceberg
Chapter 3: Batch ETL Pipeline with Apache Spark
Chapter 4: Data Visualization with Apache Superset
Chapter 5: Workflow Orchestration with Apache Airflow
Part II: Streaming Data and Real-time Analytics. - Chapter 6: Change Data Capture with Debezium and Kafka
Chapter 7: Low-latency Analytics Dashboard with ClickHouse
Chapter 8: Real-time Fraud Detection with Apache Flink
Part III: Machine Learning and Generative AI
Chapter 9: Building a Product Recommendation Engine with Spark MLlib
Chapter 10: Vector Similarity Search with Postgres and pgvector.
Notes:
Includes index.
Description based on publisher supplied metadata and other sources.
ISBN:
979-88-6882-142-4
OCLC:
1565461525

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account