1 option
Business analytics : data science for business problems / Walter R. Paczkowski.
Springer Nature - Springer Mathematics and Statistics eBooks 2021 English International Available online
Springer Nature - Springer Mathematics and Statistics eBooks 2021 English International- Format:
- Author/Creator:
- Series:
- Language:
- English
- Subjects (All):
- Physical Description:
- 1 online resource (416 pages)
- Edition:
- 1st ed.
- Place of Publication:
- Cham, Switzerland : Springer International Publishing, [2022]
- Summary:
- This book focuses on three core knowledge requirements for effective and thorough data analysis for solving business problems.These are a foundational understanding of: 1.statistical, econometric, and machine learning techniques; 2.data handling capabilities; 3.at least one programming language.
- Contents:
-
- Intro
- Preface
- The Book's Focus
- The Target Audience
- The Book's Competitive Comparison
- The Book's Structure
- Acknowledgments
- Contents
- List of Figures
- List of Tables
- Part I Beginning Analytics
- 1 Introduction to Business Data Analytics: Setting the Stage
- 1.1 Types of Business Problems
- 1.2 The Role of Information in Business Decision Making
- 1.3 Uncertainty vs. Risk
- 1.4 The Data-Information Nexus
- 1.4.1 Data and Information Confusion
- 1.4.2 The Data Component
- 1.4.3 The Extractor Component
- 1.4.3.1 Text Data
- 1.4.3.2 Numeric Data
- 1.4.3.3 Data: A Combined View
- 1.4.4 The Information Component
- 1.5 Analytics Requirements
- 1.5.1 Theoretical Framework
- 1.5.2 Data Handling
- 1.5.3 Programming Literacy
- 1.5.4 Component Interconnections
- 2 Data Sources, Organization, and Structures
- 2.1 Data Dimensions: A Taxonomy for Defining Data
- 2.1.1 Taxonomy Component #1: Source
- 2.1.2 Taxonomy Component #2: Domain
- 2.1.3 Taxonomy Component #3: Levels
- 2.1.4 Taxonomy Component #4: Continuity
- 2.1.5 Taxonomy Component #5: Measurement Scale
- 2.2 Data Organization
- 2.2.1 External Database Structures
- 2.2.2 Internal Database Structures
- 2.3 Data Dictionary
- 3 Basic Data Handling
- 3.1 Case Studies
- 3.1.1 Case Study 1: Customer Transactions Data
- 3.1.2 Case Study 2: Measures of Order Fulfillment
- 3.2 Importing Your Data
- 3.2.1 Data Formats
- 3.2.2 Importing a CSV Text File into Pandas
- 3.2.3 Importing Large Files in Chunks
- 3.2.4 Checking Your Imported Data
- 3.2.4.1 Check #1: Display the First Few Records
- 3.2.4.2 Check #2: Check the Shape of the DataFrame
- 3.2.4.3 Check #3: Check Column Names
- 3.2.4.4 Check #4: Check for Missing Values
- 3.2.4.5 Check #5: Check the Data Types
- 3.3 Merging or Joining DataFrames
- 3.4 Reshaping DataFrames.
- 3.5 Sorting a DataFrame
- 3.6 Querying a DataFrame
- 3.6.1 Boolean Operators and Indicator Functions
- 3.6.2 Pandas Query Method
- 4 Data Visualization: The Basics
- 4.1 Background for Data Visualization
- 4.2 Gestalt Principles of Visual Design
- 4.3 Issues Complicating Data Visualization
- 4.3.1 Human Visual Limitations
- 4.3.2 Data Visualization Tools
- 4.3.3 Types of Visuals
- 4.3.4 What to Look for in a Graph
- 4.3.4.1 Feature #1: Distributions
- 4.3.4.2 Feature #2: Relationships
- 4.3.4.3 Feature #3: Patterns
- 4.3.4.4 Feature #4: Trends
- 4.3.4.5 Feature #5: Anomalies
- 4.4 Visualizing Spatial Data
- 4.4.1 Data Preparation
- 4.4.2 Visualizing Continuous Spatial Data
- 4.4.3 Visualizing Categorical Spatial Data
- 4.4.4 Visualizing Continuous and Categorical Spatial Data
- 4.5 Visualizing Temporal (Time Series) Data
- 4.5.1 Properties of Temporal (Time Series) Data
- 4.5.2 Visualizing Time Series Data
- 4.5.3 Times Series Complications
- 4.6 Faceted Plots
- 4.7 Appendix
- 4.7.1 Taylor Series Expansion for Growth Rates
- 5 Advanced Data Handling: Preprocessing Methods
- 5.1 Transformations
- 5.1.1 Linear Transformations
- 5.1.2 Nonlinear Transformations
- 5.1.3 A Family of Transformations
- 5.2 Encoding
- 5.2.1 Dummy or One-Hot Encoding
- 5.2.1.1 Pandas Dummy Encoding
- 5.2.1.2 sklearn Dummy Encoding
- 5.2.2 Patsy Encoding
- 5.2.3 Label Encoding
- 5.2.4 Binarizing Data
- 5.3 Dimension Reduction
- 5.4 Handling Missing Data
- 5.5 Appendix
- 5.5.1 Mean and Variance of Standardized Variable
- 5.5.2 Mean and Variance of Adjusted Standardized Variable
- 5.5.3 Unbiased Estimators of μ and σ2
- Part II Intermediate Analytics
- 6 OLS Regression: The Basics
- 6.1 Basic OLS Concept
- 6.1.1 The Disturbance Term and the Residual
- 6.1.2 OLS Estimation
- 6.1.3 The Gauss-Markov Theorem.
- 6.2 Analysis of Variance
- 6.3 Case Study
- 6.3.1 Basic OLS Regression
- 6.3.2 The Log-Log Model
- 6.3.3 Model Set-up
- 6.3.4 Estimation Summary
- 6.3.5 ANOVA for Basic Regression
- 6.3.6 Elasticities
- 6.4 Basic Multiple Regression
- 6.4.1 ANOVA for Multiple Regression
- 6.4.2 Alternative Measures of Fit: AIC and BIC
- 6.5 Case Study: Expanded Analysis
- 6.6 Model Portfolio
- 6.7 Predictive Analysis: Introduction
- 6.7.1 Predicting vs. Forecasting
- 6.7.2 Developing a Prediction
- 6.7.3 Simulation Tool for Prediction Application
- 7 Time Series Analysis
- 7.1 Time Series Basics
- 7.1.1 Time Series Definition
- 7.1.2 Time Series Concepts
- 7.2 Importing a Date/Time Variable
- 7.3 The Data Cube and Time Series Data
- 7.4 Handling Dates and Times in Python and Pandas
- 7.4.1 Datetimes vs. Periods
- 7.4.2 Aggregating Datetime Measures
- 7.4.3 Converting Time Periods in Pandas
- 7.4.4 Date-Time Mini-Language
- 7.5 Some Calendrical Calculations
- 7.6 Time Series Generation Process: AR(1) Model
- 7.7 Visualization for AR(1) Detection
- 7.8 Durbin-Watson Test Statistic
- 7.9 Lagged Dependent and Independent Variables
- 7.9.1 Lagged Independent Variable: ARDL(0, 1)
- 7.9.2 Lagged Dependent Variable: ARDL(1, 0)
- 7.9.3 Lagged Dependent and Independent Variables:ARDL(1, 1)
- 7.10 Further Exploration of Time Series Analysis
- 7.10.1 Step 1: Identification of a Model
- 7.10.1.1 AR(p) Model
- 7.10.1.2 MA(q) Model
- 7.10.1.3 ARMA(p, q) Model
- 7.10.1.4 ARIMA(p, d, q) Model
- 7.10.1.5 Digression: Time Series Stationarity-An Overview
- 7.10.2 Step 2: Estimation of the Model
- 7.10.3 Step 3: Validation of the Model
- 7.10.4 Step 4: Forecasting with the Model
- 7.11 Appendix
- 7.11.1 Backshift Operator
- 7.11.2 Useful Algebra Results
- 7.11.3 Mean and Variance of Yt
- 7.11.4 Demeaned Data
- 7.11.5 Time Trend Addition.
- 8 Statistical Tables
- 8.1 Data Preprocessing
- 8.2 Categorical Data
- 8.3 Creating a Frequency Table
- 8.4 Hypothesis Testing: A First Step
- 8.5 Cross-tabs and Hypothesis Tests
- 8.5.1 Hypothesis Testing
- 8.5.2 Plotting a Frequency Table
- 8.6 Extending the Cross-tab
- 8.7 Pivot Tables
- 8.8 Appendix
- 8.8.1 Pearson Chi-Square Statistic
- Part III Advanced Analytics
- 9 Advanced Data Handling for Business Data Analytics
- 9.1 Supervised and Unsupervised Learning
- 9.2 Working with the Data Cube
- 9.3 The Data Cube and DataFrame Indexing
- 9.4 Sampling From a DataFrame
- 9.4.1 Simple Random Sampling (SRS)
- 9.4.2 Stratified Random Sampling
- 9.4.3 Cluster Random Sampling
- 9.5 Index Sorting of a DataFrame
- 9.6 Splitting a DataFrame: The Train-Test Splits
- 9.6.1 Model Tuning of Hyperparameters
- 9.6.2 Incorrect Use of Testing Data
- 9.6.3 Creating the Training/Testing Data Sets
- 9.6.3.1 Comment on Strategy
- 9.6.3.2 Handling Cross-Sectional Data
- 9.6.3.3 Handling Time Series Data
- 9.6.3.4 Handling Panel Data
- 9.6.4 Recombining the Data Sets
- 9.7 Appendix
- 9.7.1 Primer on Random Numbers
- 10 Advanced OLS for Business Data Analytics
- 10.1 Link Functions: An Introduction
- 10.2 Data Preprocessing
- 10.2.1 Data Standardization for Regression Analysis
- 10.2.2 One-Hot and Effects (or Sum) Encoding
- 10.3 Case Study Application
- 10.4 Heteroskedasticity Issues and Tests
- 10.4.1 Heteroskedasticity Problem
- 10.4.2 Heteroskedasticity Detection
- 10.4.3 Heteroskedasticity Remedy
- 10.5 Multicollinearity
- 10.5.1 Digression on Multicollinearity
- 10.5.2 Detection with VIF and the Condition Index
- 10.5.3 Principal Component Regression and High-Dimensional Data
- 10.6 Predictions and Scenario Analysis
- 10.6.1 Making Predictions
- 10.6.2 Scenario Analysis
- 10.6.3 Prediction Error Analysis (PEA).
- 10.6.3.1 LOOCV Approach
- 10.6.3.2 k-Fold Approach
- 10.6.3.3 Score Measures
- 10.6.3.4 Variations on Validation Methods
- 10.6.3.5 Complexity of Testing
- 10.6.3.6 Examples of k-Fold Split
- 10.7 Panel Data Models
- 11 Classification with Supervised Learning Methods
- 11.1 Case Study: Background
- 11.2 Logistic Regression
- 11.2.1 A Choice Interpretation
- 11.2.2 Properties of this Problem
- 11.2.3 A Model for the Binary Problem
- 11.2.4 Case Study: Train-Test Data Split
- 11.2.5 Case Study: Logit Model Training
- 11.2.6 Making and Assessing Predictions
- 11.2.7 Classification with a Logit Model
- 11.3 K-Nearest Neighbor (KNN)
- 11.3.1 Case Study: Predicting
- 11.4 Naive Bayes
- 11.4.1 Background: Bayes Theorem
- 11.4.2 A General Statement
- 11.4.3 The Naive Adjective: A Simplifying Assumption
- 11.4.4 Distribution Assumptions
- 11.4.5 Case Study: Naive Bayes Training
- 11.5 Decision Trees for Classification
- 11.5.1 Partitioning by Constants
- 11.5.2 Gini Index and Entropy
- 11.5.3 Case Study: Growing a Tree
- 11.5.4 Case Study: Predicting with a Tree
- 11.5.5 Random Forests
- 11.6 Support Vector Machines
- 11.6.1 Case Study: SVC Application
- 11.6.2 Case Study: Prediction
- 11.7 Classifier Accuracy Comparison
- 12 Grouping with Unsupervised Learning Methods
- 12.1 Training and Testing Data Sets
- 12.2 Hierarchical Clustering
- 12.2.1 Forms of Hierarchical Clustering
- 12.2.2 Agglomerative Algorithm Description
- 12.2.3 Metrics and Linkages
- 12.2.4 Preprocessing Data
- 12.2.5 Case Study Application
- 12.2.6 Examining More than One Solution
- 12.3 K-Means Clustering
- 12.3.1 Algorithm Description
- 12.3.2 Case Study Application
- 12.4 Mixture Model Clustering
- Bibliography
- Index.
- Notes:
-
- Description based on print version record.
- Description based on publisher supplied metadata and other sources.
- Other Format:
- Print version: Paczkowski, Walter R. Business Analytics
- ISBN:
- 9783030870232
- OCLC:
- 1295273876
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.