My Account Log in

3 options

R data mining : implement data mining techniques through practical use cases and real-world datasets / Andrea Cirillo.

EBSCOhost Academic eBook Collection (North America) Available online

View online

Ebook Central College Complete Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Cirillo, Andrea, author.
Language:
English
Subjects (All):
R (Computer program language).
Data mining.
Physical Description:
1 online resource (1 volume) : illustrations
Edition:
1st edition
Place of Publication:
Birmingham, England ; Mumbai, [India] : Packt Publishing, 2017.
System Details:
text file
Summary:
Mine valuable insights from your data using popular tools and techniques in R About This Book Understand the basics of data mining and why R is a perfect tool for it. Manipulate your data using popular R packages such as ggplot2, dplyr, and so on to gather valuable business insights from it. Apply effective data mining models to perform regression and classification tasks. Who This Book Is For If you are a budding data scientist, or a data analyst with a basic knowledge of R, and want to get into the intricacies of data mining in a practical manner, this is the book for you. No previous experience of data mining is required. What You Will Learn Master relevant packages such as dplyr, ggplot2 and so on for data mining Learn how to effectively organize a data mining project through the CRISP-DM methodology Implement data cleaning and validation tasks to get your data ready for data mining activities Execute Exploratory Data Analysis both the numerical and the graphical way Develop simple and multiple regression models along with logistic regression Apply basic ensemble learning techniques to join together results from different data mining models Perform text mining analysis from unstructured pdf files and textual data Produce reports to effectively communicate objectives, methods, and insights of your analyses In Detail R is widely used to leverage data mining techniques across many different industries, including finance, medicine, scientific research, and more. This book will empower you to produce and present impressive analyses from data, by selecting and implementing the appropriate data mining techniques in R. It will let you gain these powerful skills while immersing in a one of a kind data mining crime case, where you will be requested to help resolving a real fraud case affecting a commercial company, by the mean of both basic and advanced data mining techniques. While moving along the plot of the story you will effectively learn and practice on real data the various R packages commonly employed for this kind of tasks. You will also get the chance of apply some of the most popular and effective data mining models and algos, from the basic multiple linear regression to the most advanced Support Vector Machines. Unlike other data mining learning instruments, this book will effectively expose you the theory behind these models, their relevant assumptions and when they can be applied to the data you are facing. By the end of the book you w...
Contents:
Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Why to Choose R for Your Data Mining and Where to Start
What is R?
A bit of history
R's points of strength
Open source inside
Plugin ready
Data visualization friendly
Installing R and writing R code
Downloading R
R installation for Windows and macOS
R installation for Linux OS
Main components of a base R installation
Possible alternatives to write and run R code
RStudio (all OSs)
The Jupyter Notebook (all OSs)
Visual Studio (Windows users only)
R foundational notions
A preliminary R session
Executing R interactively through the R console
Creating an R script
Executing an R script
Vectors
Lists
Creating lists
Subsetting lists
Data frames
Functions
R's weaknesses and how to overcome them
Learning R effectively and minimizing the effort
The tidyverse
Leveraging the R community to learn R
Where to find the R community
Engaging with the community to learn R
Handling large datasets with R
Further references
Summary
Chapter 2: A First Primer on Data Mining Analysing Your Bank Account Data
Acquiring and preparing your banking data
Data model
Summarizing your data with pivot-like tables
A gentle introduction to the pipe operator
An even more gentle introduction to the dplyr package
Installing the necessary packages and loading your data into R
Installing and loading the necessary packages
Importing your data into R
Defining the monthly and daily sum of expenses
Visualizing your data with ggplot2
Basic data visualization principles
Less but better
Not every chart is good for your message
Scatter plot
Line chart
Bar plot
Other advanced charts.
Colors have to be chosen carefully
A bit of theory - chromatic circle, hue, and luminosity
Visualizing your data with ggplot
One more gentle introduction - the grammar of graphics
A layered grammar of graphics - ggplot2
Visualizing your banking movements with ggplot2
Visualizing the number of movements per day of the week
Chapter 3: The Data Mining Process - CRISP-DM Methodology
The Crisp-DM methodology data mining cycle
Business understanding
Data understanding
Data collection
How to perform data collection with R
Data import from TXT and CSV files
Data import from different types of format already structured as tables
Data import from unstructured sources
Data description
How to perform data description with R
Data exploration
What to use in R to perform this task
The summary() function
Box plot
Histograms
Data preparation
Modelling
Defining a data modeling strategy
How similar problems were solved in the past
Emerging techniques
Classification of modeling problems
How to perform data modeling with R
Evaluation
Clustering evaluation
Classification evaluation
Regression evaluation
How to judge the adequacy of a model's performance
Deployment
Deployment plan development
Maintenance plan development
Chapter 4: Keeping the House Clean - The Data Mining Architecture
A general overview
Data sources
Types of data sources
Unstructured data sources
Structured data sources
Key issues of data sources
Databases and data warehouses
The third wheel - the data mart
One-level database
Two-level database
Three-level database
Technologies
SQL
MongoDB
Hadoop
The data mining engine
The interpreter.
The interface between the engine and the data warehouse
The data mining algorithms
User interface
Clarity
Clarity and mystery
Clarity and simplicity
Efficiency
Consistency
Syntax highlight
Auto-completion
How to build a data mining architecture in R
The data warehouse
The interface between the engine and the data warehouse
The user interface
Chapter 5: How to Address a Data Mining Problem - Data Cleaning and Validation
On a quiet day
Data cleaning
Tidy data
Analysing the structure of our data
The str function
The describe function
head, tail, and View functions
Evaluating your data tidiness
Every row is a record
Every column shows an attribute
Every table represents an observational unit
Tidying our data
The tidyr package
Long versus wide data
The spread function
The gather function
The separate function
Applying tidyr to our dataset
Validating our data
Fitness for use
Conformance to standards
Data quality controls
Consistency checks
Data type checks
Logical checks
Domain checks
Uniqueness checks
Performing data validation on our data
Data type checks with str()
The final touch - data merging
left_join function
moving beyond left_join
Chapter 6: Looking into Your Data Eyes - Exploratory Data Analysis
Introducing summary EDA
Describing the population distribution
Quartiles and Median
Mean
The mean and phenomenon going on within sub populations
The mean being biased by outlier values
Computing the mean of our population
Variance
Standard deviation
Skewness
Measuring the relationship between variables
Correlation.
The Pearson correlation coefficient
Distance correlation
Weaknesses of summary EDA - the Anscombe quartet
Graphical EDA
Visualizing a variable distribution
Histogram
Reporting date histogram
Geographical area histogram
Cash flow histogram
Boxplot
Checking for outliers
Visualizing relationships between variables
Scatterplots
Adding title, subtitle, and caption to the plot
Setting axis and legend
Adding explicative text to the plot
Final touches on colors
Chapter 7: Our First Guess - a Linear Regression
Defining a data modelling strategy
Data modelling notions
Supervised learning
Unsupervised learning
The modeling strategy
Applying linear regression to our data
The intuition behind linear regression
The math behind the linear regression
Ordinary least squares technique
Model requirements - what to look for before applying the model
Residuals' uncorrelation
Residuals' homoscedasticity
How to apply linear regression in R
Fitting the linear regression model
Validating model assumption
Visualizing fitted values
Preparing the data for visualization
Developing the data visualization
Chapter 8: A Gentle Introduction to Model Performance Evaluation
Defining model performance
Fitting versus interpretability
Making predictions with models
Measuring performance in regression models
Mean squared error
R-squared
R-squared meaning and interpretation
R-squared computation in R
Adjusted R-squared
R-squared misconceptions
The R-squared doesn't measure the goodness of fit
A low R-squared doesn't mean your model is not statistically significant
Measuring the performance in classification problems
The confusion matrix
Confusion matrix in R
Accuracy.
How to compute accuracy in R
Sensitivity
How to compute sensitivity in R
Specificity
How to compute specificity in R
How to choose the right performance statistics
A final general warning - training versus test datasets
Chapter 9: Don't Give up - Power up Your Regression Including Multiple Variables
Moving from simple to multiple linear regression
Notation
Assumptions
Variables' collinearity
Tolerance
Variance inflation factors
Addressing collinearity
Dimensionality reduction
Stepwise regression
Backward stepwise regression
From the full model to the n-1 model
Forward stepwise regression
Double direction stepwise regression
Principal component regression
Fitting a multiple linear model with R
Model fitting
Variable assumptions validation
Residual assumptions validation
Linear model cheat sheet
Chapter 10: A Different Outlook to Problems with Classification Models
What is classification and why do we need it?
Linear regression limitations for categorical variables
Common classification algorithms and models
Logistic regression
The intuition behind logistic regression
The logistic function estimates a response variable enclosed within an upper and lower bound
The logistic function estimates the probability of an observation pertaining to one of the two available categories
The math behind logistic regression
Maximum likelihood estimator
Model assumptions
Absence of multicollinearity between variables
Linear relationship between explanatory variables and log odds
Large enough sample size
How to apply logistic regression in R
Fitting the model
Reading the glm() estimation output.
The level of statistical significance of the association between the explanatory variable and the response variable.
Notes:
Includes bibliographical references at the end of each chapters and index.
Description based on online resource; title from PDF title page (EBC, viewed December 29, 2017).
ISBN:
9781787129238
1787129233
OCLC:
1018480584

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account