1 option
Data Exploration and Preparation with BigQuery : A Practical Guide to Cleaning, Transforming, and Analyzing Data for Business Insights / Mike Kahn.
- Format:
- Book
- Author/Creator:
- Kahn, Mike, author.
- Language:
- English
- Subjects (All):
- Google BigQuery.
- Google Analytics.
- Data mining.
- Physical Description:
- 1 online resource (264 pages)
- Edition:
- First edition.
- Place of Publication:
- Birmingham, England : Packt Publishing Ltd., [2023]
- Summary:
- Leverage BigQuery to understand and prepare your data to ensure that it's accurate, reliable, and ready for analysis and modeling Key Features Use mock datasets to explore data with the BigQuery web UI, bq CLI, and BigQuery API in the Cloud console Master optimization techniques for storage and query performance in BigQuery Engage with case studies on data exploration and preparation for advertising, transportation, and customer support data Purchase of the print or Kindle book includes a free PDF eBook Book Description Data professionals encounter a multitude of challenges such as handling large volumes of data, dealing with data silos, and the lack of appropriate tools. Datasets often arrive in different conditions and formats, demanding considerable time from analysts, engineers, and scientists to process and uncover insights. The complexity of the data life cycle often hinders teams and organizations from extracting the desired value from their data assets. Data Exploration and Preparation with BigQuery offers a holistic solution to these challenges. The book begins with the basics of BigQuery while covering the fundamentals of data exploration and preparation. It then progresses to demonstrate how to use BigQuery for these tasks and explores the array of big data tools at your disposal within the Google Cloud ecosystem. The book doesn't merely offer theoretical insights; it's a hands-on companion that walks you through properly structuring your tables for query efficiency and ensures adherence to data preparation best practices. You'll also learn when to use Dataflow, BigQuery, and Dataprep for ETL and ELT workflows. The book will skillfully guide you through various case studies, demonstrating how BigQuery can be used to solve real-world data problems. By the end of this book, you'll have mastered the use of SQL to explore and prepare datasets in BigQuery, unlocking deeper insights from data. What you will learn Assess the quality of a dataset and learn best practices for data cleansing Prepare data for analysis, visualization, and machine learning Explore approaches to data visualization in BigQuery Apply acquired knowledge to real-life scenarios and design patterns Set up and organize BigQuery resources Use SQL and other tools to navigate datasets Implement best practices to query BigQuery datasets Gain proficiency in using data preparation tools, techniques, and strategies Who this book is for This book is for data analysts seeking to enhance their data exploration and preparation skills using BigQuery. It guides anyone using BigQuery as a data warehouse to extract business insights from large datasets. A basic understanding of SQL, reporting, data modeling, and transformations will assist with understanding the topics covered in this book.
- Contents:
- Cover
- Copyright
- Contributors
- Table of Contents
- Preface
- Part 1: Introduction to BigQuery
- Chapter 1: Introducing BigQuery and Its Components
- Technical requirements
- What is BigQuery?
- Understanding how BigQuery works
- Dremel, the execution engine
- Colossus distributed storage
- The Borg compute platform
- Jupiter network infrastructure
- BigQuery administration and access
- Tools for administration
- Understanding identity and access management
- BigQuery best practices and cost management
- Best practices
- Understanding and controlling costs
- Extending your data
- BigQuery ML
- External datasets
- External connections
- Summary
- References
- Chapter 2: BigQuery Organization and Design
- Understanding BigQuery's resource hierarchy
- Organizations, folders, and projects
- BigQuery-specific resources
- BigQuery storage
- Exploring architecture patterns
- The centralized enterprise data warehouse
- The decentralized data warehouse
- The cross-org data exchange
- Schema design
- Table design
- Part 2: Data Exploration with BigQuery
- Chapter 3: Exploring Data in BigQuery
- What is data exploration?
- Fundamentals
- Data life cycle
- Common challenges and solutions
- Introduction to exploring data in BigQuery
- Exploring data in the BigQuery SQL workspace
- Exploring schema and table structure
- Exploring data using SQL
- Exploring data using the bq command-line interface
- Exploring data with visualization tools
- Enhancing data exploration in BigQuery
- Advanced approaches
- Chapter 4: Loading and Transforming Data
- Exploring data loading techniques
- Batch loading data
- Streaming ingestion of data
- Scheduled loading of data.
- Situations where you do not need to load data
- Data transformation with BigQuery
- Evaluating ETL and ELT approaches for data integration
- Hands-on exercise - data loading and transformation in BigQuery
- Understanding the scenario
- Loading data from a local file
- Preparing and transforming data
- Chapter 5: Querying BigQuery Data
- Understanding query structure
- Action command - the SELECT clause
- Location command - the FROM clause
- Filtering command - the WHERE clause
- Selection handling commands - the GROUP BY, HAVING, ORDER BY, and LIMIT clauses
- Understanding data types
- Using expressions and aggregations
- Expressions
- Aggregations
- Joining tables
- Inner joins
- Outer joins
- Using functions
- Advanced querying techniques
- Subqueries
- Window functions
- Common table expressions
- Array functions
- Saving, sharing, and scheduling queries
- Optimizing queries
- Troubleshooting queries
- Further reading
- Chapter 6: Exploring Data with Notebooks
- Understanding the value of using notebooks
- Jupyter notebooks
- Using Workbench notebook instances in Vertex AI
- Creating a managed notebook
- Executions and schedules
- Hands-on exercise - analyzing Google Trends data with Workbench
- Using Colab notebooks
- Comparing Workbench instances and Colab
- Chapter 7: Further Exploring and Visualizing Data
- Understanding data distributions
- Why is it important to understand data distributions?
- Uncovering relationships in data
- Exploring BigQuery data with Google Sheets
- Connecting to Sheets from BigQuery using Explore with Sheets
- Connecting to BigQuery using Connected Sheets
- Column statistics
- Collaboration with BigQuery data in Sheets.
- Visualizing BigQuery data using Looker Studio
- Creating the right visualizations
- Hands-on exercise - creating visualizations with Looker Studio
- Commonly created charts
- Calculations in visualization tools
- Data quality discovery while creating visualizations
- Filtering data in visualizations
- Integrating other visualization tools with BigQuery
- Part 3: Data Preparation with BigQuery
- Chapter 8: An Overview of Data Preparation Tools
- Getting started with data preparation
- Clearly defining your data preparation goals
- Evaluating your current data quality
- Data cleaning and transformation
- Validating prepared data
- Data preparation approaches
- Data preparation tools
- Visual data preparation tools
- Query and code-based tools
- Automated data preparation
- Chapter 9: Cleansing and Transforming Data
- Using ELT for cleansing and transforming data
- Assessing dataset integrity
- The shape of the dataset
- Skew of the dataset
- Data profiling
- Data validation
- Data visualization
- Using SQL for data cleansing and transformation
- SQL data cleansing strategies and examples
- SQL data transformation strategies and examples
- Writing query results
- Using Cloud Dataprep for visual cleansing and transformation
- Chapter 10: Best Practices for Data Preparation, Optimization, and Cost Control
- Data preparation best practices
- Understanding your data and business requirements
- Denormalizing your data
- Optimizing schema design
- Considering nested and repeated fields
- Using correct data types
- Data cleansing and validation
- Partitioning and clustering
- Optimizing data loading
- Best practices for optimizing storage.
- Long-term and compressed storage
- Cross-cloud data analytics with federated access model and BigQuery Omni
- Backup and recovery
- Best practices for optimizing compute
- Analysis cost options
- Query optimization
- Query optimization cheat sheet
- Monitoring and controlling costs
- Query plan and query performance insights
- Monitoring, estimating, and optimizing costs
- Controlling costs
- Part 4: Hands-On and Conclusion
- Chapter 11: Hands-On Exercise - Analyzing Advertising Data
- Exercise and use case overview
- Loading CSV data files from local upload
- Data preparation
- Standardizing date formats
- Data exploration, analysis, and visualization
- Analyzing ads and sales data
- Return on ad spend
- Visualizations
- Chapter 12: Hands-On Exercise - Analyzing Transportation Data
- Loading data from GCS to BigQuery
- Uploading data files to Google Cloud Storage
- Loading data into BigQuery
- Data exploration and analysis
- Visualizing data with BigQuery geography functions
- Chapter 13: Hands-On Exercise - Analyzing Customer Support Data
- Data loading from CSV upload
- Count of ticket_type across both datasets
- The most common support issues using ticket_subject data
- Average resolution time per ticket_type
- Customer demographics using customer_age and customer_gender
- Analyzing emotions with sentiment analysis
- Creating a connection
- Granting access to the external connection service account
- Creating a model
- Querying the model
- References and further reading.
- Chapter 15: Summary and Future Directions
- Summary of key points
- Chapter 1, Introducing BigQuery and Its Components
- Chapter 2, BigQuery Organization and Design
- Chapter 3, Exploring Data in BigQuery
- Chapter 4, Loading and Transforming Data
- Chapter 5, Querying BigQuery Data
- Chapter 6, Exploring Data with Notebooks
- Chapter 7, Further Exploring and Visualizing Data
- Chapter 8, An Overview of Data Preparation Tools
- Chapter 9, Cleansing and Transforming Data
- Chapter 10, Best Practices for Data Preparation, Optimization, and Cost Control
- Chapter 11, Hands-On Exercise - Analyzing Advertising Data
- Chapter 12, Hands-On Exercise - Analyzing Transportation Data
- Chapter 13, Hands-On Exercise - Analyzing Customer Support Data
- Future directions
- More integration with AI and ML
- Generative AI
- Natural language queries
- DataOps
- Hybrid and multi-cloud data analysis
- Zero-ETL and real-time analytics
- Data governance and privacy
- Federated learning
- Data clean rooms
- Data monetization
- Additional resources
- Final words
- Index
- Other Books You May Enjoy.
- Notes:
- Includes bibliographical references and index.
- Description based on print version record.
- ISBN:
- 9781805123422
- 1805123424
- OCLC:
- 1411277597
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.