My Account Log in

1 option

Elements of Data Science, Machine Learning, and Artificial Intelligence Using R / Frank Emmert-Streib, Salissou Moutari, and Matthias Dehmer.

Springer eBooks EBA - Intelligent Technologies and Robotics Collection 2023 Available online

View online
Format:
Book
Author/Creator:
Emmert-Streib, Frank, author.
Moutari, Salissou, author.
Dehmer, Matthias, author.
Series:
Intelligent Technologies and Robotics Series
Language:
English
Subjects (All):
Artificial intelligence.
Machine learning.
R (Computer program language).
Physical Description:
1 online resource (582 pages)
Edition:
First edition.
Place of Publication:
Cham, Switzerland : Springer, [2023]
Summary:
The textbook provides students with tools they need to analyze complex data using methods from data science, machine learning and artificial intelligence.The authors include both the presentation of methods along with applications using the programming language R, which is the gold standard for analyzing data.
Contents:
Intro
Preface
Contents
1 Introduction to Learning from Data
1.1 What Is Data Science?
1.2 Converting Data into Knowledge
1.2.1 Big Aims: Big Questions
1.2.2 Generating Insights by Visualization
1.3 Structure of the Book
1.3.1 Part I
1.3.2 Part II
1.3.3 Part III
1.4 Our Motivation for Writing This Book
1.5 How to Use This Book
1.6 Summary
Part I General Topics
2 General Prediction Models
2.1 Introduction
2.2 Categorization of Methods
2.2.1 Properties of the Data
2.2.2 Properties of the Optimization Algorithm
2.2.3 Properties of the Model
2.2.4 Summary
2.3 Overview of Prediction Models
2.4 Causal Model versus Predictive Model
2.5 Explainable AI
2.6 Fundamental Statistical Characteristics of Prediction Models
2.6.1 Example
2.7 Summary
2.8 Exercises
3 General Error Measures
3.1 Introduction
3.2 Motivation
3.3 Fundamental Error Measures
3.4 Error Measures
3.4.1 True-Positive Rate and True-Negative Rate
3.4.2 Positive Predictive Value and Negative Predictive Value
3.4.3 Accuracy
3.4.4 F-Score
3.4.5 False Discovery Rate and False Omission Rate
3.4.6 False-Negative Rate and False-Positive Rate
3.4.7 Matthews Correlation Coefficient
3.4.8 Cohen's Kappa
3.4.9 Normalized Mutual Information
3.4.10 Area Under the Receiver Operator Characteristic Curve
3.5 Evaluation of Outcome
3.5.1 Evaluation of an Individual Method
3.5.2 Comparing Multiple Binary Decision-Making Methods
3.6 Summary
3.7 Exercises
4 Resampling Methods
4.1 Introduction
4.2 Resampling Methods for Error Estimation
4.2.1 Holdout Set
4.2.2 Leave-One-Out CV
4.2.3 K-Fold Cross-Validation
4.3 Extended Resampling Methods for Error Estimation
4.3.1 Repeated Holdout Set
4.3.2 Repeated K-Fold CV
4.3.3 Stratified K-Fold CV.
4.4 Bootstrap
4.4.1 Resampling With versus Resampling Without Replacement
4.5 Subsampling
4.6 Different Types of Prediction Data Sets
4.7 Sampling from a Distribution
4.8 Standard Error
4.9 Summary
4.10 Exercises
5 Data
5.1 Introduction
5.2 Data Types
5.2.1 Genomic Data
5.2.2 Network Data
5.2.3 Text Data
5.2.4 Time-to-Event Data
5.2.5 Business Data
5.3 Summary
Part II Core Methods
6 Statistical Inference
6.1 Exploratory Data Analysis and Descriptive Statistics
6.1.1 Data Structure
6.1.2 Data Preprocessing
6.1.3 Summary Statistics and Presentation of Information
6.1.4 Measures of Location
6.1.4.1 Sample Mean
6.1.4.2 Trimmed Sample Mean
6.1.4.3 Sample Median
6.1.4.4 Quartile
6.1.4.5 Percentile
6.1.4.6 Mode
6.1.4.7 Proportion
6.1.5 Measures of Scale
6.1.5.1 Sample Variance
6.1.5.2 Range
6.1.5.3 Interquartile Range
6.1.6 Measures of Shape
6.1.6.1 Skewness
6.1.6.2 Kurtosis
6.1.7 Data Transformation
6.1.8 Example: Summary of Data and EDA
6.2 Sample Estimators
6.2.1 Point Estimation
6.2.2 Unbiased Estimators
6.2.3 Biased Estimators
6.2.4 Sufficiency
6.3 Bayesian Inference
6.3.1 Conjugate Priors
6.3.2 Continuous Parameter Estimation
6.3.2.1 Example: Continuous Bayesian Inference Using R
6.3.3 Discrete Parameter Estimation
6.3.4 Bayesian Credible Intervals
6.3.5 Prediction
6.3.6 Model Selection
6.4 Maximum Likelihood Estimation
6.4.1 Asymptotic Confidence Intervals for MLE
6.4.2 Bootstrap Confidence Intervals for MLE
6.4.3 Meaning of Confidence Intervals
6.5 Expectation-Maximization Algorithm
6.5.1 Example: EM Algorithm
6.6 Summary
6.7 Exercises
7 Clustering
7.1 Introduction
7.2 What Is Clustering?
7.3 Comparison of Data Points
7.3.1 Distance Measures.
7.3.2 Similarity Measures
7.4 Basic Principle of Clustering Algorithms
7.5 Non-hierarchical Clustering Methods
7.5.1 K-Means Clustering
7.5.2 K-Medoids Clustering
7.5.3 Partitioning Around Medoids (PAM)
7.6 Hierarchical Clustering
7.6.1 Dendrograms
7.6.2 Two Types of Dissimilarity Measures
7.6.3 Linkage Functions for Agglomerative Clustering
7.6.4 Example
7.7 Defining Feature Vectors for General Objects
7.8 Cluster Validation
7.8.1 External Criteria
7.8.2 Assessing the Numerical Values of Indices
7.8.3 Internal Criteria
7.9 Summary
7.10 Exercises
8 Dimension Reduction
8.1 Introduction
8.2 Feature Extraction
8.2.1 An Overview of PCA
8.2.2 Geometrical Interpretation of PCA
8.2.3 PCA Procedure
8.2.4 Underlying Mathematical Problems in PCA
8.2.5 PCA Using Singular Value Decomposition
8.2.6 Assessing PCA Results
8.2.7 Illustration of PCA Using R
8.2.8 Kernel PCA
8.2.9 Discussion
8.2.10 Non-negative Matrix Factorization
8.2.10.1 NNMF Using the Frobenius Norm as Objective Function
8.2.10.2 NNMF Using the Generalized Kullback-Leibler Divergence as Objective Function
8.2.10.3 Example of NNMF Using R
8.3 Feature Selection
8.3.1 Filter Methods Using Mutual Information
8.4 Summary
8.5 Exercises
9 Classification
9.1 Introduction
9.2 What Is Classification?
9.3 Common Aspects of Classification Methods
9.3.1 Basic Idea of a Classifier
9.3.2 Training and Test Data
9.3.3 Error Measures
9.3.3.1 Error Measures for Multi-class Classification
9.4 Naive Bayes Classifier
9.4.1 Educational Example
9.4.2 Example
9.5 Linear Discriminant Analysis
9.5.1 Extensions
9.6 Logistic Regression
9.7 k-Nearest Neighbor Classifier
9.8 Support Vector Machine
9.8.1 Linearly Separable Data
9.8.2 Nonlinearly Separable Data.
9.8.3 Nonlinear Support Vector Machines
9.8.4 Examples
9.9 Decision Tree
9.9.1 What Is a Decision Tree?
9.9.1.1 Three Principal Steps to Get a Decision Tree
9.9.2 Step 1: Growing a Decision Tree
9.9.3 Step 2: Assessing the Size of a Decision Tree
9.9.3.1 Intuitive Approach
9.9.3.2 Formal Approach
9.9.4 Step 3: Pruning a Decision Tree
9.9.4.1 Alternative Way to Construct Optimal Decision Trees: Stopping Rules
9.9.5 Predictions
9.10 Summary
9.11 Exercises
10 Hypothesis Testing
10.1 Introduction
10.2 What Is Hypothesis Testing?
10.3 Key Components of Hypothesis Testing
10.3.1 Step 1: Select Test Statistic
10.3.2 Step 2: Null Hypothesis H0 and AlternativeHypothesis H1
10.3.3 Step 3: Sampling Distribution
10.3.3.1 Examples
10.3.4 Step 4: Significance Level α
10.3.5 Step 5: Evaluate the Test Statistic from Data
10.3.6 Step 6: Determine the p-Value
10.3.7 Step 7: Make a Decision about the Null Hypothesis
10.4 Type 2 Error and Power
10.4.1 Connections between Power and Errors
10.5 Confidence Intervals
10.5.1 Confidence Intervals for a Population Mean with Known Variance
10.5.2 Confidence Intervals for a Population Mean with Unknown Variance
10.5.3 Bootstrap Confidence Intervals
10.6 Important Hypothesis Tests
10.6.1 Student's t-Test
10.6.1.1 One-Sample t-Test
10.6.1.2 Two-Sample t-Test
10.6.1.3 Extensions
10.6.2 Correlation Tests
10.6.3 Hypergeometric Test
10.6.3.1 Null Hypothesis and Sampling Distribution
10.6.3.2 Examples
10.6.4 Finding the Correct Hypothesis Test
10.7 Permutation Tests
10.8 Understanding versus Applying Hypothesis Tests
10.9 Historical Notes and Misinterpretations
10.10 Summary
10.11 Exercises
11 Linear Regression Models
11.1 Introduction
11.1.1 What Is Linear Regression?.
11.1.2 Motivating Example
11.2 Simple Linear Regression
11.2.1 Ordinary Least Squares Estimation of Coefficients
11.2.2 Variability of the Coefficients
11.2.3 Testing the Necessity of Coefficients
11.2.4 Assessing the Quality of a Fit
11.3 Preprocessing
11.4 Multiple Linear Regression
11.4.1 Testing the Necessity of Coefficients
11.4.2 Assessing the Quality of a Fit
11.5 Diagnosing Linear Models
11.5.1 Error Assumptions
11.5.2 Linearity Assumption of the Model
11.5.3 Leverage Points
11.5.4 Outliers
11.5.5 Collinearity
11.5.6 Discussion
11.6 Advanced Topics
11.6.1 Interactions
11.6.2 Nonlinearities
11.6.3 Categorical Predictors
11.6.4 Generalized Linear Models
11.6.4.1 How to Determine Which Family to Use When Fitting a GLM
11.6.4.2 Advantages of GLMs over Traditional OLS Regression
11.6.4.3 Example: Poisson Regression
11.6.4.4 Example: Logistic Regression
11.7 Summary
11.8 Exercises
12 Model Selection
12.1 Introduction
12.2 Difference Between Model Selection and Model Assessment
12.3 General Approach to Model Selection
12.4 Model Selection for Multiple Linear Regression Models
12.4.1 R2 and Adjusted R2
12.4.2 Mallow's Cp Statistic
12.4.3 Akaike's Information Criterion (AIC) and Schwarz's BIC
12.4.4 Best Subset Selection
12.4.5 Stepwise Selection
12.4.5.1 Forward Stepwise Selection
12.4.5.2 Backward Stepwise Selection
12.5 Model Selection for Generalized Linear Models
12.5.1 Negative Binomial Regression Model
12.5.2 Zero-Inflated Poisson Model
12.5.3 Quasi-Poisson Model
12.5.4 Comparison of GLMs
12.6 Model Selection for Bayesian Models
12.7 Nonparametric Model Selection for General Models with Resampling
12.8 Summary
12.9 Exercises
Part III Advanced Topics
13 Regularization
13.1 Introduction.
13.2 Preliminaries.
Notes:
Includes bibliographical references and index.
Description based on print version record.
Description based on publisher supplied metadata and other sources.
ISBN:
3-031-13339-0
OCLC:
1401961443

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account