My Account Log in

3 options

R data analysis cookbook : a journey from data computation to data-driven insights / Kuntal Ganguly.

EBSCOhost Academic eBook Collection (North America) Available online

View online

Ebook Central College Complete Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Ganguly, Kuntal, author.
Language:
English
Subjects (All):
R (Computer program language).
Physical Description:
1 online resource (1 volume) : illustrations
Edition:
Second edition.
Place of Publication:
Birmingham, England ; Mumbai, [India] : Packt, 2017.
System Details:
text file
Biography/History:
Ganguly Kuntal: Kuntal Ganguly is a big data analytics engineer focused on building large-scale, data-driven systems using big data frameworks and machine learning. He has around 7 years experience of building big data and machine learning applications. Kuntal provides solutions to cloud customers in building real-time analytics systems using managed cloud services and open source Hadoop ecosystem technologies such as Spark, Kafka, Storm, Solr, and so on, along with machine learning and deep learning frameworks. Kuntal enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several large-scale distributed applications. He is a machine learning and deep learning practitioner and is very passionate about building intelligent applications. Viswanathan Shanthi: nanViswanathan Viswa: Viswa Viswanathan is an associate professor of Computing and Decision Sciences at the Stillman School of Business in Seton Hall University. After completing his PhD in Artificial Intelligence, Viswa spent a decade in academia and then switched to a leadership position in the software industry for another decade during which he worked for Infosys, Igate, and Starbase. He embraced academia once again in 2001. Viswa has taught extensively in fields ranging from operations research, computer science, software engineering, management information systems, and enterprise systems. In addition to university teaching, Viswa has conducted training programs for industry professionals and has written several peer-reviewed research publications in journals such as Operations Research, IEEE Software, Computers and Industrial Engineering, and International Journal of Artificial Intelligence in Education. He has authored a book titled Data Analytics with R: A hands-on approach. Viswa thoroughly enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several web-based applications. Apart from his deep interest in technical fields such as data analytics, artificial intelligence, computer science, and software engineering, Viswa harbors a deep interest in education with special emphasis on the roots of learning and methods to foster deeper learning. He has done research in this area and hopes to pursue the subject further. Viswa would like to express deep gratitude to professors Amitava Bagchi and Anup Sen, who were inspirational forces during his early research career. He is also grateful to several extremely intelligent colleagues, notable among them being Rajesh Venkatesh, Dan Richner, and Sriram Bala, who significantly shaped his thinking. His aunt, Analdavalli; his sister, Sankari; and his wife, Shanthi, taught him much about hard work, and even the little he has absorbed has helped him immensely. His sons, Nitin and Siddarth, have helped with numerous insightful comments on various topics.
Summary:
Over 80 recipes to help you breeze through your data analysis projects using R About This Book Analyse your data using the popular R packages like ggplot2 with ready-to-use and customizable recipes Find meaningful insights from your data and generate dynamic reports A practical guide to help you put your data analysis skills in R to practical use Who This Book Is For This book is for data scientists, analysts and even enthusiasts who want to learn and implement the various data analysis techniques using R in a practical way. Those looking for quick, handy solutions to common tasks and challenges in data analysis will find this book to be very useful. Basic knowledge of statistics and R programming is assumed. What You Will Learn Acquire, format and visualize your data using R Using R to perform an Exploratory data analysis Introduction to machine learning algorithms such as classification and regression Get started with social network analysis Generate dynamic reporting with Shiny Get started with geospatial analysis Handling large data with R using Spark and MongoDB Build Recommendation system- Collaborative Filtering, Content based and Hybrid Learn real world dataset examples- Fraud Detection and Image Recognition In Detail Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book will show you how you can put your data analysis skills in R to practical use, with recipes catering to the basic as well as advanced data analysis tasks. Right from acquiring your data and preparing it for analysis to the more complex data analysis techniques, the book will show you how you can implement each technique in the best possible manner. You will also visualize your data using the popular R packages like ggplot2 and gain hidden insights from it. Starting with implementing the basic data analysis concepts like handling your data to creating basic plots, you will master the more advanced data analysis techniques like performing cluster analysis, and generating effective analysis reports and visualizations. Throughout the book, you will get to know the common problems and obstacles you might encounter while implementing each of the data analysis techniques in R, with ways to overcoming them in the easiest possible way. By the end of this ...
Contents:
Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Acquire and Prepare the Ingredients - Your Data
Introduction
Working with data
Reading data from CSV files
Getting ready
How to do it...
How it works...
There's more...
Handling different column delimiters
Handling column headers/variable names
Handling missing values
Reading strings as characters and not as factors
Reading data directly from a website
Reading XML data
Extracting HTML table data from a web page
Extracting a single HTML table from a web page
Reading JSON data
Reading data from fixed-width formatted files
Files with headers
Excluding columns from data
Reading data from R files and R libraries
Saving all objects in a session
Saving objects selectively in a session
Attaching/detaching R data files to an environment
Listing all datasets in loaded packages
Removing cases with missing values
Eliminating cases with NA for selected variables
Finding cases that have no missing values
Converting specific values to NA
Excluding NA values from computations
Replacing missing values with the mean
Imputing random values sampled from non-missing values
Removing duplicate cases
There's more.
Identifying duplicates without deleting them
Rescaling a variable to specified min-max range
Rescaling many variables at once
See also
Normalizing or standardizing data in a data frame
Standardizing several variables simultaneously
Binning numerical data
Creating a specified number of intervals automatically
Creating dummies for categorical variables
Choosing which variables to create dummies for
Handling missing data
Understanding missing data pattern
Correcting data
Combining multiple columns to single columns
Splitting single column to multiple columns
Imputing data
Detecting outliers
Treating the outliers with mean/median imputation
Handling extreme values with capping
Transforming and binning values
Outlier detection with LOF
Chapter 2: What&amp
#x27
s in There - Exploratory Data Analysis
Creating standard data summaries
Using the str() function for an overview of a data frame
Computing the summary and the str() function for a single variable
Finding other measures
Extracting a subset of a dataset
Excluding columns
Selecting based on multiple values
Selecting using logical vector
Splitting a dataset
Creating random data partitions
Case 1 - Numerical target variable and two partitions
Case 2 - Numerical target variable and three partitions
Case 3 - Categorical target variable and two partitions
Case 4 - Categorical target variable and three partitions
Using a convenience function for partitioning
Sampling from a set of values
Generating standard plots, such as histograms, boxplots, and scatterplots
Creating histograms
Creating boxplots
Creating scatterplots
Creating scatterplot matrices
Histograms
Boxplots
Overlay a density plot on a histogram
Overlay a regression line on a scatterplot
Color specific points on a scatterplot
Generating multiple plots on a grid
Graphics parameters
Creating plots with the lattice package
Adding flair to your graphs
Creating charts that facilitate comparisons
Using base plotting system
Creating&amp
#160
beanplots with the beanplot package
Creating charts that help to visualize possible causality
Chapter 3: Where Does It Belong? Classification
Generating error/classification confusion matrices
Visualizing the error/classification confusion matrix
Comparing the model's performance for different classes
Principal Component Analysis
Generating receiver operating characteristic charts
Using arbitrary class labels
Building, plotting, and evaluating with classification trees
Computing raw probabilities
Creating the ROC chart
Using random forest models for classification
Generating the ROC chart
Specifying cutoffs for classification
Classifying using the support vector machine approach
Controlling the scaling of variables
Determining the type of SVM model
Assigning weights to the classes
Choosing the cost of SVM
Tuning the SVM
Classifying using the Naive Bayes approach
Classifying using the KNN approach
Automating the process of running KNN for many k values
Selecting appropriate values of k using caret
Using KNN to compute raw probabilities instead of classifications
Using neural networks for classification
Exercising greater control over nnet
Generating raw probabilities and plotting the ROC curve
Classifying using linear discriminant function analysis
Using the formula interface for lda
Classifying using logistic regression
Text classification for sentiment analysis
Chapter 4: Give Me a Number - Regression
Computing the root-mean-square error
Using a convenience function to compute the RMS error
Building KNN models for regression
Running KNN with cross-validation in place of a validation partition
Using a convenience function to run KNN
Using a convenience function to run KNN for multiple k values
Performing linear regression
Forcing lm to use a specific factor level as the reference
Using other options in the formula expression for linear models
Performing variable selection in linear regression
Building regression trees
Generating regression trees for data with categorical predictors
Generating regression trees using the ensemble method - Bagging and Boosting
Building random forest models for regression
Controlling forest generation
Using neural networks for regression
Performing k-fold cross-validation
Performing leave-one-out cross-validation to limit overfitting
How to do it.
How it works.
Notes:
Previous edition published: 2015.
Includes bibliographical references at the end of each chapters and index.
Description based on online resource; title from PDF title page (ebrary, viewed October 18, 2017).
OCLC:
1006894355

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account