My Account Log in

1 option

SQL for data scientists : a beginner's guide for building datasets for analysis / Renee M. Teate.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Teate, Renee M., author.
Language:
English
Subjects (All):
SQL (Computer program language).
Physical Description:
1 online resource (291 pages)
Place of Publication:
Hoboken, New Jersey : John Wiley & Sons, Inc., [2021]
Summary:
SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis is a resource that’s dedicated to the Structured Query Language (SQL) and dataset design skills that data scientists use most. Aspiring data scientists will learn how to how to construct datasets for exploration, analysis, and machine learning. You can also discover how to approach query design and develop SQL code to extract data insights while avoiding common pitfalls. You may be one of many people who are entering the field of Data Science from a range of professions and educational backgrounds, such as business analytics, social science, physics, economics, and computer science. Like many of them, you may have conducted analyses using spreadsheets as data sources, but never retrieved and engineered datasets from a relational database using SQL, which is a programming language designed for managing databases and extracting data. This guide for data scientists differs from other instructional guides on the subject. It doesn’t cover SQL broadly. Instead, you’ll learn the subset of SQL skills that data analysts and data scientists use frequently. You’ll also gain practical advice and direction on "how to think about constructing your dataset." In this book, author Renee Teate shares knowledge gained during a 15-year career working with data, in roles ranging from database developer to data analyst to data scientist. She guides you through SQL code and dataset design concepts from an industry practitioner’s perspective, moving your data scientist career forward!
Contents:
Cover
Title Page
Copyright Page
About the Author
About the Technical Editor
Acknowledgments
Contents at a Glance
Contents
Introduction
Who I Am and Why I'm Writing About This Topic
Who This Book Is For
Why You Should Learn SQL if You Want to Be a Data Scientist
What I Hope You Gain from This Book
Conventions
Reader Support for This Book
Companion Download Files
How to Contact the Publisher
How to Contact the Author
Chapter 1 Data Sources
Data Sources
Tools for Connecting to Data Sources and Editing SQL
Relational Databases
Dimensional Data Warehouses
Asking Questions About the Data Source
Introduction to the Farmer's Market Database
A Note on Machine Learning Dataset Terminology
Exercises
Chapter 2 The SELECT Statement
The SELECT Statement
The Fundamental Syntax Structure of a SELECT Query
Selecting Columns and Limiting the Number of Rows Returned
The ORDER BY Clause: Sorting Results
Introduction to Simple Inline Calculations
More Inline Calculation Examples: Rounding
More Inline Calculation Examples: Concatenating Strings
Evaluating Query Output
SELECT Statement Summary
Exercises Using the Included Database
Chapter 3 The WHERE Clause
The WHERE Clause
Filtering SELECT Statement Results
Filtering on Multiple Conditions
Multi-Column Conditional Filtering
More Ways to Filter
BETWEEN
IN
LIKE
IS NULL
A Warning About Null Comparisons
Filtering Using Subqueries
Chapter 4 CASE Statements
CASE Statement Syntax
Creating Binary Flags Using CASE
Grouping or Binning Continuous Values Using CASE
Categorical Encoding Using CASE
CASE Statement Summary
Chapter 5 SQL JOINs
Database Relationships and SQL JOINs
A Common Pitfall when Filtering Joined Data
JOINs with More than Two Tables
Chapter 6 Aggregating Results for Analysis
GROUP BY Syntax
Displaying Group Summaries
Performing Calculations Inside Aggregate Functions
MIN and MAX
COUNT and COUNT DISTINCT
Average
Filtering with HAVING
CASE Statements Inside Aggregate Functions
Chapter 7 Window Functions and Subqueries
ROW NUMBER
RANK and DENSE RANK
NTILE
Aggregate Window Functions
LAG and LEAD
Chapter 8 Date and Time Functions
Setting datetime Field Values
EXTRACT and DATE_PART
DATE_ADD and DATE_SUB
DATEDIFF
TIMESTAMPDIFF
Date Functions in Aggregate Summaries and Window Functions
Chapter 9 Exploratory Data Analysis with SQL
Demonstrating Exploratory Data Analysis with SQL
Exploring the Products Table
Exploring Possible Column Values
Exploring Changes Over Time
Exploring Multiple Tables Simultaneously
Exploring Inventory vs. Sales
Chapter 10 Building SQL Datasets for Analytical Reporting
Thinking Through Analytical Dataset Requirements
Using Custom Analytical Datasets in SQL: CTEs and Views
Taking SQL Reporting Further
Chapter 11 More Advanced Query Structures
UNIONs
Self-Join to Determine To-Date Maximum
Counting New vs. Returning Customers by Week
Summary
Chapter 12 Creating Machine Learning Datasets Using SQL
Datasets for Time Series Models
Datasets for Binary Classification
Creating the Dataset
Expanding the Feature Set
Feature Engineering
Taking Things to the Next Level
Chapter 13 Analytical Dataset Development Examples
What Factors Correlate with Fresh Produce Sales?
How Do Sales Vary by Customer Zip Code, Market Distance, and Demographic Data?
How Does Product Price Distribution Affect Market Sales?
Chapter 14 Storing and Modifying Data
Storing SQL Datasets as Tables and Views
Adding a Timestamp Column
Inserting Rows and Updating Values in Database Tables
Using SQL Inside Scripts
In Closing
Appendix Answers to Exercises
Chapter 1: Data Sources
Answers
Chapter 2: The SELECT Statement
Chapter 3: The WHERE Clause
Chapter 4: CASE Statements
Chapter 5: SQL JOINs
Chapter 6: Aggregating Results for Analysis
Chapter 7: Window Functions and Subqueries
Chapter 8: Date and Time Functions
Chapter 9: Exploratory Data Analysis with SQL
Chapter 10: Building SQL Datasets for Analytical Reporting
Chapter 11: More Advanced Query Structures
Chapter 12: Creating Machine Learning Datasets Using SQL
Chapter 14: Storing and Modifying Data
Notes:
Description based on print version record.
ISBN:
9781119669395
1119669391
9781119669388
1119669383
9781119669371
1119669375
OCLC:
1269508290

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account