My Account Log in

1 option

Data analysis with R : a comprehensive uide to manipulating, analyzing, and visualizing data in R / Anthony Fischetti.

Ebook Central College Complete Available online

View online
Format:
Book
Author/Creator:
Fischetti, Anthony, author.
Language:
English
Subjects (All):
Information visualization.
Database design.
Physical Description:
1 online resource (570 pages)
Edition:
Second edition.
Place of Publication:
Birmingham ; Mumbai : Packt Publishing, 2018.
Summary:
R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.
Contents:
Cover
Title Page
Copyright and Credits
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: RefresheR
Navigating the basics
Arithmetic and assignment
Logicals and characters
Flow of control
Getting help in R
Vectors
Subsetting
Vectorized functions
Advanced subsetting
Recycling
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Covariance
Correlation coefficients
Comparing multiple correlations
Categorical and continuous variables
Two categorical variables
Two continuous variables
More than two continuous variables
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
Parameters
The binomial distribution
The normal distribution
The three-sigma rule and using z-tables
Chapter 5: Using Data To Reason About The World
Estimating means
The sampling distribution
Interval estimation
How did we get 1.96?
Smaller samples
Chapter 6: Testing Hypotheses
The null hypothesis significance testing framework
One and two-tailed tests
Errors in NHST
A warning about significance
A warning about p-values
Testing the mean of one sample
Assumptions of the one sample t-test
Testing two means.
Assumptions of the independent samples t-test
Testing more than two means
Assumptions of ANOVA
Testing independence of proportions
What if my assumptions are unfounded?
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC - stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Chapter 8: The Bootstrap
What's... uhhh... the deal with the bootstrap?
Performing the bootstrap in R (more elegantly)
Confidence intervals
A one-sample test of means
Bootstrapping statistics other than the mean
Busting bootstrap myths
What have we left out?
Chapter 9: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
A word of warning
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Cross-validation
Striking a balance
Linear regression diagnostics
Second Anscombe relationship
Third Anscombe relationship
Fourth Anscombe relationship
Advanced topics
Chapter 10: Predicting Categorical Variables
k-Nearest neighbors
Using k-NN in R
Confusion matrices
Limitations of k-NN
Logistic regression
Generalized Linear Model (GLM)
Using logistic regression in R
Decision trees
Random forests
Choosing a classifier
The vertical decision boundary
The diagonal decision boundary
The crescent decision boundary
The circular decision boundary
Chapter 11: Predicting Changes with Time
What is a time series?
What is forecasting?
Uncertainty
Difficulties in forecasting.
Creating and plotting time series
Components of time series
Time series decomposition
White noise
Autocorrelation
Smoothing
Simple exponential smoothing for forecasting
Accuracy assessment
Double exponential smoothing
Triple exponential smoothing
ETS and the state space model
Interventions for improvement
What we didn't cover
Citations for the climate change data
Chapter 12: Sources of Data
Relational databases
Why didn't we just do that in SQL?
Using JSON
XML
Other data formats
Online repositories
Chapter 13: Dealing with Missing Data
Analysis with missing data
Visualizing missing data
Types of missing data
So which one is it?
Unsophisticated methods for dealing with missing data
Complete case analysis
Pairwise deletion
Mean substitution
Hot deck imputation
Regression imputation
Stochastic regression imputation
Multiple imputation
So how does mice come up with the imputed values?
Methods of imputation
Multiple imputation in practice
Chapter 14: Dealing with Messy Data
Checking unsanitized data
Checking for out-of-bounds data
Checking the data type of a column
Checking for unexpected categories
Checking for outliers, entry errors, or unlikely data points
Chaining assertions
Regular expressions
What are regular expressions?
Getting started
Regex for data normalization
More normalization
Other tools for messy data
OpenRefine
Fuzzy matching
Chapter 15: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Allocation of memory
Vectorization
Using optimized packages
Using another R implementation
Using parallelization.
Getting started with parallel R
An example of (some) substance
Using Rcpp
Being smarter about your code
Chapter 16: Working with Popular R Packages
The data.table package
The i in DT [i, j, by]
What in the world are by reference semantics?
The j in DT[i, j, by]
Using both i and j
Using the by argument for grouping
Joining data tables
Reshaping, melting, and pivoting data
Using dplyr and tidyr to manipulate data
Functional programming as a main tidyverse principle
Loading data for use in dplyr
Manipulating rows
Selecting and renaming columns
Computing on columns
Grouping in dplyr
Joining data
Reshaping data with tidyr
Chapter 17: Reproducibility and Best Practices
R scripting
RStudio
Running R scripts
An example script
Scripting and reproducibility
R projects
Version control
Package version management
Communicating results
Other Books You May Enjoy
Index.
Notes:
Description based on print version record.
ISBN:
9781788397339
1788397339
OCLC:
1030818399

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account