1 option
Data analysis with R : a comprehensive uide to manipulating, analyzing, and visualizing data in R / Anthony Fischetti.
- Format:
- Book
- Author/Creator:
- Fischetti, Anthony, author.
- Language:
- English
- Subjects (All):
- Information visualization.
- Database design.
- Physical Description:
- 1 online resource (570 pages)
- Edition:
- Second edition.
- Place of Publication:
- Birmingham ; Mumbai : Packt Publishing, 2018.
- Summary:
- R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.
- Contents:
- Cover
- Title Page
- Copyright and Credits
- Packt Upsell
- Contributors
- Table of Contents
- Preface
- Chapter 1: RefresheR
- Navigating the basics
- Arithmetic and assignment
- Logicals and characters
- Flow of control
- Getting help in R
- Vectors
- Subsetting
- Vectorized functions
- Advanced subsetting
- Recycling
- Functions
- Matrices
- Loading data into R
- Working with packages
- Exercises
- Summary
- Chapter 2: The Shape of Data
- Univariate data
- Frequency distributions
- Central tendency
- Spread
- Populations, samples, and estimation
- Probability distributions
- Visualization methods
- Chapter 3: Describing Relationships
- Multivariate data
- Relationships between a categorical and continuous variable
- Relationships between two categorical variables
- The relationship between two continuous variables
- Covariance
- Correlation coefficients
- Comparing multiple correlations
- Categorical and continuous variables
- Two categorical variables
- Two continuous variables
- More than two continuous variables
- Chapter 4: Probability
- Basic probability
- A tale of two interpretations
- Sampling from distributions
- Parameters
- The binomial distribution
- The normal distribution
- The three-sigma rule and using z-tables
- Chapter 5: Using Data To Reason About The World
- Estimating means
- The sampling distribution
- Interval estimation
- How did we get 1.96?
- Smaller samples
- Chapter 6: Testing Hypotheses
- The null hypothesis significance testing framework
- One and two-tailed tests
- Errors in NHST
- A warning about significance
- A warning about p-values
- Testing the mean of one sample
- Assumptions of the one sample t-test
- Testing two means.
- Assumptions of the independent samples t-test
- Testing more than two means
- Assumptions of ANOVA
- Testing independence of proportions
- What if my assumptions are unfounded?
- Chapter 7: Bayesian Methods
- The big idea behind Bayesian analysis
- Choosing a prior
- Who cares about coin flips
- Enter MCMC - stage left
- Using JAGS and runjags
- Fitting distributions the Bayesian way
- The Bayesian independent samples t-test
- Chapter 8: The Bootstrap
- What's... uhhh... the deal with the bootstrap?
- Performing the bootstrap in R (more elegantly)
- Confidence intervals
- A one-sample test of means
- Bootstrapping statistics other than the mean
- Busting bootstrap myths
- What have we left out?
- Chapter 9: Predicting Continuous Variables
- Linear models
- Simple linear regression
- Simple linear regression with a binary predictor
- A word of warning
- Multiple regression
- Regression with a non-binary predictor
- Kitchen sink regression
- The bias-variance trade-off
- Cross-validation
- Striking a balance
- Linear regression diagnostics
- Second Anscombe relationship
- Third Anscombe relationship
- Fourth Anscombe relationship
- Advanced topics
- Chapter 10: Predicting Categorical Variables
- k-Nearest neighbors
- Using k-NN in R
- Confusion matrices
- Limitations of k-NN
- Logistic regression
- Generalized Linear Model (GLM)
- Using logistic regression in R
- Decision trees
- Random forests
- Choosing a classifier
- The vertical decision boundary
- The diagonal decision boundary
- The crescent decision boundary
- The circular decision boundary
- Chapter 11: Predicting Changes with Time
- What is a time series?
- What is forecasting?
- Uncertainty
- Difficulties in forecasting.
- Creating and plotting time series
- Components of time series
- Time series decomposition
- White noise
- Autocorrelation
- Smoothing
- Simple exponential smoothing for forecasting
- Accuracy assessment
- Double exponential smoothing
- Triple exponential smoothing
- ETS and the state space model
- Interventions for improvement
- What we didn't cover
- Citations for the climate change data
- Chapter 12: Sources of Data
- Relational databases
- Why didn't we just do that in SQL?
- Using JSON
- XML
- Other data formats
- Online repositories
- Chapter 13: Dealing with Missing Data
- Analysis with missing data
- Visualizing missing data
- Types of missing data
- So which one is it?
- Unsophisticated methods for dealing with missing data
- Complete case analysis
- Pairwise deletion
- Mean substitution
- Hot deck imputation
- Regression imputation
- Stochastic regression imputation
- Multiple imputation
- So how does mice come up with the imputed values?
- Methods of imputation
- Multiple imputation in practice
- Chapter 14: Dealing with Messy Data
- Checking unsanitized data
- Checking for out-of-bounds data
- Checking the data type of a column
- Checking for unexpected categories
- Checking for outliers, entry errors, or unlikely data points
- Chaining assertions
- Regular expressions
- What are regular expressions?
- Getting started
- Regex for data normalization
- More normalization
- Other tools for messy data
- OpenRefine
- Fuzzy matching
- Chapter 15: Dealing with Large Data
- Wait to optimize
- Using a bigger and faster machine
- Be smart about your code
- Allocation of memory
- Vectorization
- Using optimized packages
- Using another R implementation
- Using parallelization.
- Getting started with parallel R
- An example of (some) substance
- Using Rcpp
- Being smarter about your code
- Chapter 16: Working with Popular R Packages
- The data.table package
- The i in DT [i, j, by]
- What in the world are by reference semantics?
- The j in DT[i, j, by]
- Using both i and j
- Using the by argument for grouping
- Joining data tables
- Reshaping, melting, and pivoting data
- Using dplyr and tidyr to manipulate data
- Functional programming as a main tidyverse principle
- Loading data for use in dplyr
- Manipulating rows
- Selecting and renaming columns
- Computing on columns
- Grouping in dplyr
- Joining data
- Reshaping data with tidyr
- Chapter 17: Reproducibility and Best Practices
- R scripting
- RStudio
- Running R scripts
- An example script
- Scripting and reproducibility
- R projects
- Version control
- Package version management
- Communicating results
- Other Books You May Enjoy
- Index.
- Notes:
- Description based on print version record.
- ISBN:
- 9781788397339
- 1788397339
- OCLC:
- 1030818399
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.