2 options
The informed company : how to build modern agile data stacks that drive winning insights / Dave Fowler, Matthew C. David.
- Format:
- Book
- Author/Creator:
- Fowler, Dave (Computer scientist), author.
- David, Matt (Computer scientist), author.
- Language:
- English
- Subjects (All):
- Data structures (Computer science).
- Big data.
- Cloud computing.
- Physical Description:
- 1 online resource (259 pages)
- Place of Publication:
- Hoboken, New Jersey : John Wiley & Sons, Inc., [2022]
- Summary:
- "In their work at Chartio, Fowler and David get to meet many people who work with data every day. One of their favorite questions to ask them is, "Where did you learn everything you know about data?" Surprisingly, most people tell them they're completely self-taught and have "just figured it out". As a follow-up, they ask what sources they've relied on, and the answers are all over the map. Mostly they'll cite Google, StackOverflow, blogs, and sometimes these books: Agile Data Warehouse Design by Lawrence Corr (2011) or The Data Warehouse Toolkit by Ralph Kimball (originally published in 2004 with a 3rd edition update in 2013). These books were very good for their time, and became classics. But in the timeframe of data, they're ancient. Both were written before Redshift and the gains of the cloud C-Store warehouse. Back then, data was at a totally different scale, had very different costs, was used with totally different products, and was handled by people with very different training--primarily just at enterprise companies. It has gotten to the point where pointing people to these books can do more harm than good. Over the years,Fowler and David have had the incredible opportunity to work with many data teams, architectures, tools, and platforms, and they've built up a body of knowledge around what works--and what doesn't--when it comes to data. They've been sharing this knowledge with customers, and waiting for someone to publish a book on these maturing modern data best practices.They got a bit impatient and earlier this year,theye gathered their notes and combined knowledge and started writing the definitive new data book themselves"-- Provided by publisher.
- Contents:
- Cover
- Title Page
- Copyright Page
- Contents
- About This Book
- Why Write This Book
- Who This Book Is For
- Who This Book Is Not For
- Who Wrote the Book
- Who Edited the Book
- Influences
- How This Book Was Written
- How to Read This Book
- Foreword
- Introduction
- Merging Business Context with Data Information
- The Four Stages of Agile Data Organization
- Stage 1 Source aka Siloed Data
- Chapter 1 Starting with Source Data
- Common Options for Analyzing Source Data
- Chapter 2 The Need to Replicate Source Data
- Replicate Sources
- Create Read-OnlyAccess
- Chapter 3 Source Data Best Practices
- Keep a Complexity Wiki Page
- Snippet Dictionary
- Use a BI Product
- Double Check Results
- Keep Short Dashboards
- Design Before Building
- Stage 2 Data Lake aka Data Combined
- Chapter 4 Why Build a Data Lake?
- What Is a Data Lake?
- Reasons to Build a Data Lake Summarized
- Chapter 5 Choosing an Engine for the Data Lake
- Modern Columnar Warehouse Engines
- Modern Warehouse Engine Products
- Database Engines
- Recommendation
- Chapter 6 Extract and Load (EL) Data
- ETL versus ELT
- EL/ETL Vendors
- Extract Options
- Load Options
- Multiple Schemas
- Other Extract and Load Routes
- Chapter 7 Data Lake Security
- Access in Central Place
- Permission Tiers
- Chapter 8 Data Lake Maintenance
- Why SQL?
- Data Sources
- Performance
- Upgrade Snippets to Views
- Stage 3 Data Warehouse aka the Single Source of Truth
- Chapter 9 The Power of Layers and Views
- Make Readable Views
- Layer Views on Views
- Start with a Single View
- Chapter 10 Staging Schemas
- Orient to the Schemas
- Pick a Table and Clean It
- Other Staging Modeling Considerations
- Building on Top of Staging Schemas
- Chapter 11 Model Data with dbt
- Version Control
- Modularity and Reusability
- Package Management.
- Organizing Files
- Macros
- Incremental Tables
- Testing
- Chapter 12 Deploy Modeling Code
- Branch Using Version Control Software
- Commit Message
- Test Locally
- Code Review
- Schedule Runs
- Chapter 13 Implementing the Data Warehouse
- Manage Dependencies
- Combine Tables Within Schemas
- Combine Tables Across Schemas
- Keep the Grain Consistent
- Create Business Metrics
- Keeping Accurate History
- Chapter 14 Managing Data Access
- How to Secure Sensitive Data in the Data Warehouse
- How to Secure Sensitive Data in a BI Tool
- Chapter 15 Maintaining the Source of Truth
- Track New Metrics
- Deprecate Old Metrics
- Deprecate Old Schemas
- Resolve Conflicting Numbers
- Handling Ongoing Requests and Ongoing Feedback
- Updating Modeling Code
- Manage Access
- Tuning to Optimize
- Code Review All Modeling
- Maintenance Checklist
- Stage 4 Data Marts aka Data Democratized
- Chapter 16 Data Mart Implementation
- Views on the Data Warehouse
- Segment Tables
- Access Update
- Chapter 17 Data Mart Maintenance
- Educate Team
- Identifies Issues
- Identify New Needs
- Help Track Success
- Chapter 18 Modern versus Traditional Data Stacks: What's Changed?
- What's Changed?
- Chapter 19 Row- versus Column-Oriented Database
- Row-Oriented Databases
- Column-Oriented Databases
- Summary
- Chapter 20 Style Guide Example
- Simplify
- Clean
- Naming Conventions
- Share It
- Chapter 21 Building an SST Example
- First Attempt-Same Tables with Prefixes
- Second Attempt-Operational Schema (Source Agnostic)
- Third Attempt-Application Separate, Other Sources Smashed
- Less Planning, More Implementing
- Acknowledgments and Contributions
- Thank-yous
- Index
- EULA.
- Notes:
- Description based on print version record.
- Includes index.
- ISBN:
- 9781119748014
- 1119748011
- 9781119748021
- 111974802X
- OCLC:
- 1283860421
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.