2 options
Fundamentals of Analytics Engineering : An Introduction to Building End-To-end Analytics Solutions / Dumky De Wilde [and six others].
- Format:
- Book
- Author/Creator:
- Wilde, Dumky De, author.
- Language:
- English
- Subjects (All):
- Data mining.
- Systems engineering.
- Physical Description:
- 1 online resource (332 pages)
- Edition:
- First edition.
- Place of Publication:
- Birmingham, England : Packt Publishing, [2024]
- Biography/History:
- Wilde Dumky De: Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization. Kassapian Fanny: Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacyGligorevic Jovan: Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generouslyPerafan Juan Manuel: Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetupBenninga Lasse: Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide valueLopez Ricardo Angel Granados: Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data qualityPereira Tais Laurindo: Tais is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by he. ..
- Summary:
- Gain a holistic understanding of the analytics engineering lifecycle by integrating principles from both data analysis and engineering Key Features Discover how analytics engineering aligns with your organization's data strategy Access insights shared by a team of seven industry experts Tackle common analytics engineering problems faced by modern businesses Purchase of the print or Kindle book includes a free PDF eBook Book Description Navigate the world of data analytics with Fundamentals of Analytics Engineering--guiding you from foundational concepts to advanced techniques of data ingestion and warehousing, data lakehouse, and data modeling. Written by a team of 7 industry experts, this book helps you to transform raw data into structured insights. In this book, you'll discover how to clean, filter, aggregate, and reformat data, and seamlessly serve it across diverse platforms. With practical guidance, you'll also learn how to build a simple data platform using Airbyte for ingestion, DuckDB for warehousing, dbt for transformations, and Tableau for visualization. From data quality and observability to fostering collaboration on codebases, you'll discover effective strategies for ensuring data integrity and driving collaborative success. As you advance, you'll become well-versed with the CI/CD principles for automated code building, testing, and deployment--laying the foundation for consistent and reliable pipelines. And with invaluable insights into gathering business requirements, documenting complex business logic, and the importance of data governance, you'll develop a holistic understanding of the analytics lifecycle. By the end of this book, you'll be armed with the essential techniques and best practices for developing scalable analytics solutions from end to end. What you will learn Design and implement data pipelines from ingestion to serving data Explore best practices for data modeling and schema design Gain insights into the use of cloud-based analytics platforms and tools for scalable data processing Understand the principles of data governance and collaborative coding Comprehend data quality management in analytics engineering Gain practical skills in using analytics engineering tools to conquer real-world data challenges Who this book is for This book is for data engineers and data analysts considering pivoting their careers into analytics engineering. Analytics engineers who want to upskill and search for gaps in their knowledge will also find this book helpful, as will other data professionals who want to understand the value of analytics engineering in their organization's journey toward data maturity. To get the most out of this book, you should have a basic understanding of data analysis and engineering concepts such as data cleaning, visualization, ETL and data warehousing.
- Contents:
- Cover
- Title Page
- Copyright and Credits
- Dedications
- Foreword
- Contributors
- Table of Contents
- Preface
- Prologue
- Part 1: Introduction to Analytics Engineering
- Chapter 1: What Is Analytics Engineering?
- Introducing analytics engineering
- Defining analytics engineering
- Why do we need analytics engineering?
- A supermarket analogy
- The shift from ETL to ELT
- The difference between analytics engineers, data analysts, and data engineers
- Summary
- Chapter 2: The Modern Data Stack
- Understanding a Modern Data Stack
- Explaining three key differentiators versus legacy stacks
- Lowering technical barriers with a SQL-first approach
- Improving infrastructure efficiency with cloud-native systems
- Simplifying implementation and maintenance with managed and modular solutions
- Discussing the advantages and disadvantages of the MDS
- Part 2: Building Data Pipelines
- Chapter 3: Data Ingestion
- Digging into the problem of moving data between two systems
- The source of all problems
- Understanding the eight essential steps of a data ingestion pipeline
- Trigger
- Connection
- State management
- Data extraction
- Transformations
- Validation and data quality
- Loading
- Archiving and retention
- Managing the quality and scalability of data ingestion pipelines - the three key topics
- Scalability and resilience
- Monitoring, logging, and alerting
- Governance
- Working with data ingestion - an example pipeline
- Chapter 4: Data Warehousing
- Uncovering the evolution of data warehousing
- The problem with transactional databases
- The history of data warehouses
- Moving to the cloud
- Benefits of cloud versus on-premises data warehouses
- Cloud data warehouse users - no one-size fits all
- Building blocks of a cloud data warehouse
- Compute.
- Knowing the market leaders in cloud data warehousing
- Amazon Redshift
- Google BigQuery
- Snowflake
- Databricks
- Use case - choosing the right cloud data warehouse
- Managed versus self-hosted data warehouses
- Chapter 5: Data Modeling
- The importance of data models
- Completeness
- Enforcement of business rules
- Minimizing redundancy
- Data reusability
- Stability and flexibility
- Elegance
- Communication
- Integration
- Potential trade-offs
- The elephant in the room - performance
- Designing your data model
- Data modeling techniques
- Bill Inmon and relational modeling
- Ralph Kimball and dimensional modeling
- Daniel Linstedt and Data Vault
- Comparison of the different data models
- Choosing a data model
- Chapter 6: Transforming Data
- Transforming data - the foundation of analytics work
- A key step in the data value chain
- Challenges in transforming data
- Design choices
- Where to apply transformations
- Specify your data model
- Layering transformations
- Data transformation best practices
- Readability and reusability first, optimization second
- Modularity
- Other best practices
- An example of writing modular code
- Tools that facilitate data transformations
- Types of transformation tools
- Considerations
- Chapter 7: Serving Data
- Exposing data using dashboarding and BI tools
- Dashboards
- Spreadsheets
- Programming environments
- Low-code tools
- Reverse ETL
- Valuable
- Usable
- Sensible
- Serving data - four key topics
- Self-serving analytics and report factories
- Interactive and static reports
- Actionable and vanity metrics
- Reusability and bespoke processes
- Part 3: Hands-On Guide to Building a Data Platform
- Chapter 8: Hands-On Analytics Engineering
- Technical requirements.
- Understanding the Stroopwafelshop use case
- Business objectives, metrics, and KPIs
- Looking at the data
- The thing about spreadsheets
- What about BI tools?
- The tooling
- Preparing Google Cloud
- ELT using Airbyte Cloud
- Loading the Stroopwafelshop data using Airbyte Cloud
- Modeling data using dbt Cloud
- The shortcomings of conventional analytics
- The role of dbt in analytics engineering
- Setting up dbt Cloud
- Data marts
- Additional dbt features
- Visualizing data with Tableau
- Why Tableau?
- Selecting the KPIs
- First visualization
- Creating measures
- Creating the store growth dashboard
- What's next?
- Part 4: DataOps
- Chapter 9: Data Quality and Observability
- Understanding the problem of data quality at the source, in transformations, and in data governance
- Data quality issues in source systems
- Data quality issues in data infrastructure and data pipelines
- How data governance impacts data quality
- Finding solutions to data quality issues - observability, data catalogs, and semantic layers
- Using observability to improve your data quality
- The benefits of data catalogs for data quality
- Improving data quality with a semantic layer
- Chapter 10: Writing Code in a Team
- Identifying the responsibilities of team members
- Tracking tasks and issues
- Tools for issue and task tracking
- Clear task definition
- Categorization and tagging
- Managing versions with version control
- Working with Git
- Git branching
- Development workflow for analytics engineers
- Working with coding standards
- PEP8
- ANSI
- Linters
- Pre-commit hooks
- Reviewing code
- Pull requests - The four eyes principle
- Continuous integration/continuous deployment
- Documenting code
- Documenting code in dbt
- Code comments
- READMEs
- Documentation on getting started.
- Conceptual documentation
- Working with containers
- Refactoring and technical debt
- Chapter 11: Automating Workflows
- Introducing DataOps
- Orchestrating data pipelines
- Designing an automated workflow - considerations
- dbt Cloud
- Airflow
- Continuous integration
- Continuous
- Handling integration issues
- Automating testing with a CI pipeline
- Continuous deployment
- The CD pipeline
- Slim CI/CD
- Configuring CI/CD in dbt Cloud
- Continuous delivery
- Continuous delivery versus continuous deployment
- Part 5: Data Strategy
- Chapter 12: Driving Business Adoption
- Defining analytics translation
- The analytics value chain
- Scoping analytics use cases
- Identifying stakeholders
- Ideating analytics use cases
- Prioritizing use cases
- Ensuring business adoption
- Working incrementally
- Gathering feedback
- Knowing when to stop developing
- Communicating your results
- Documenting business logic
- Chapter 13: Data Governance
- Understanding data governance
- The objective of data governance
- Applying data governance in analytics engineering
- Defining data ownership
- Data quality and integrity
- Managing data assets
- Training, enablement, and best practices
- Data definitions
- Addressing critical areas for seamless data governance
- Resistance to change and adoption
- Engaging stakeholders and fostering collaboration
- Establishing a data governance roadmap
- Chapter 14: Epilogue
- Reviewing the fundamental insights - what you've learned so far
- Making your career future-proof - how to take it further
- Tip #1 - keep learning and developing your skills
- Tip #2 - network and engage with the community
- Tip #3 - showcase your work and build a portfolio
- Closing remarks
- Index
- Other Books You May Enjoy.
- Notes:
- Description based on publisher supplied metadata and other sources.
- Description based on print version record.
- ISBN:
- 9781837632114
- 1837632111
- OCLC:
- 1428526380
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.