3 options
R data analysis cookbook : a journey from data computation to data-driven insights / Kuntal Ganguly.
- Format:
- Book
- Author/Creator:
- Ganguly, Kuntal, author.
- Language:
- English
- Subjects (All):
- R (Computer program language).
- Physical Description:
- 1 online resource (1 volume) : illustrations
- Edition:
- Second edition.
- Place of Publication:
- Birmingham, England ; Mumbai, [India] : Packt, 2017.
- System Details:
- text file
- Biography/History:
- Ganguly Kuntal: Kuntal Ganguly is a big data analytics engineer focused on building large-scale, data-driven systems using big data frameworks and machine learning. He has around 7 years experience of building big data and machine learning applications. Kuntal provides solutions to cloud customers in building real-time analytics systems using managed cloud services and open source Hadoop ecosystem technologies such as Spark, Kafka, Storm, Solr, and so on, along with machine learning and deep learning frameworks. Kuntal enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several large-scale distributed applications. He is a machine learning and deep learning practitioner and is very passionate about building intelligent applications. Viswanathan Shanthi: nanViswanathan Viswa: Viswa Viswanathan is an associate professor of Computing and Decision Sciences at the Stillman School of Business in Seton Hall University. After completing his PhD in Artificial Intelligence, Viswa spent a decade in academia and then switched to a leadership position in the software industry for another decade during which he worked for Infosys, Igate, and Starbase. He embraced academia once again in 2001. Viswa has taught extensively in fields ranging from operations research, computer science, software engineering, management information systems, and enterprise systems. In addition to university teaching, Viswa has conducted training programs for industry professionals and has written several peer-reviewed research publications in journals such as Operations Research, IEEE Software, Computers and Industrial Engineering, and International Journal of Artificial Intelligence in Education. He has authored a book titled Data Analytics with R: A hands-on approach. Viswa thoroughly enjoys hands-on software development and has single-handedly conceived, architected, developed, and deployed several web-based applications. Apart from his deep interest in technical fields such as data analytics, artificial intelligence, computer science, and software engineering, Viswa harbors a deep interest in education with special emphasis on the roots of learning and methods to foster deeper learning. He has done research in this area and hopes to pursue the subject further. Viswa would like to express deep gratitude to professors Amitava Bagchi and Anup Sen, who were inspirational forces during his early research career. He is also grateful to several extremely intelligent colleagues, notable among them being Rajesh Venkatesh, Dan Richner, and Sriram Bala, who significantly shaped his thinking. His aunt, Analdavalli; his sister, Sankari; and his wife, Shanthi, taught him much about hard work, and even the little he has absorbed has helped him immensely. His sons, Nitin and Siddarth, have helped with numerous insightful comments on various topics.
- Summary:
- Over 80 recipes to help you breeze through your data analysis projects using R About This Book Analyse your data using the popular R packages like ggplot2 with ready-to-use and customizable recipes Find meaningful insights from your data and generate dynamic reports A practical guide to help you put your data analysis skills in R to practical use Who This Book Is For This book is for data scientists, analysts and even enthusiasts who want to learn and implement the various data analysis techniques using R in a practical way. Those looking for quick, handy solutions to common tasks and challenges in data analysis will find this book to be very useful. Basic knowledge of statistics and R programming is assumed. What You Will Learn Acquire, format and visualize your data using R Using R to perform an Exploratory data analysis Introduction to machine learning algorithms such as classification and regression Get started with social network analysis Generate dynamic reporting with Shiny Get started with geospatial analysis Handling large data with R using Spark and MongoDB Build Recommendation system- Collaborative Filtering, Content based and Hybrid Learn real world dataset examples- Fraud Detection and Image Recognition In Detail Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book will show you how you can put your data analysis skills in R to practical use, with recipes catering to the basic as well as advanced data analysis tasks. Right from acquiring your data and preparing it for analysis to the more complex data analysis techniques, the book will show you how you can implement each technique in the best possible manner. You will also visualize your data using the popular R packages like ggplot2 and gain hidden insights from it. Starting with implementing the basic data analysis concepts like handling your data to creating basic plots, you will master the more advanced data analysis techniques like performing cluster analysis, and generating effective analysis reports and visualizations. Throughout the book, you will get to know the common problems and obstacles you might encounter while implementing each of the data analysis techniques in R, with ways to overcoming them in the easiest possible way. By the end of this ...
- Contents:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Acquire and Prepare the Ingredients - Your Data
- Introduction
- Working with data
- Reading data from CSV files
- Getting ready
- How to do it...
- How it works...
- There's more...
- Handling different column delimiters
- Handling column headers/variable names
- Handling missing values
- Reading strings as characters and not as factors
- Reading data directly from a website
- Reading XML data
- Extracting HTML table data from a web page
- Extracting a single HTML table from a web page
- Reading JSON data
- Reading data from fixed-width formatted files
- Files with headers
- Excluding columns from data
- Reading data from R files and R libraries
- Saving all objects in a session
- Saving objects selectively in a session
- Attaching/detaching R data files to an environment
- Listing all datasets in loaded packages
- Removing cases with missing values
- Eliminating cases with NA for selected variables
- Finding cases that have no missing values
- Converting specific values to NA
- Excluding NA values from computations
- Replacing missing values with the mean
- Imputing random values sampled from non-missing values
- Removing duplicate cases
- There's more.
- Identifying duplicates without deleting them
- Rescaling a variable to specified min-max range
- Rescaling many variables at once
- See also
- Normalizing or standardizing data in a data frame
- Standardizing several variables simultaneously
- Binning numerical data
- Creating a specified number of intervals automatically
- Creating dummies for categorical variables
- Choosing which variables to create dummies for
- Handling missing data
- Understanding missing data pattern
- Correcting data
- Combining multiple columns to single columns
- Splitting single column to multiple columns
- Imputing data
- Detecting outliers
- Treating the outliers with mean/median imputation
- Handling extreme values with capping
- Transforming and binning values
- Outlier detection with LOF
- Chapter 2: What&
- #x27
- s in There - Exploratory Data Analysis
- Creating standard data summaries
- Using the str() function for an overview of a data frame
- Computing the summary and the str() function for a single variable
- Finding other measures
- Extracting a subset of a dataset
- Excluding columns
- Selecting based on multiple values
- Selecting using logical vector
- Splitting a dataset
- Creating random data partitions
- Case 1 - Numerical target variable and two partitions
- Case 2 - Numerical target variable and three partitions
- Case 3 - Categorical target variable and two partitions
- Case 4 - Categorical target variable and three partitions
- Using a convenience function for partitioning
- Sampling from a set of values
- Generating standard plots, such as histograms, boxplots, and scatterplots
- Creating histograms
- Creating boxplots
- Creating scatterplots
- Creating scatterplot matrices
- Histograms
- Boxplots
- Overlay a density plot on a histogram
- Overlay a regression line on a scatterplot
- Color specific points on a scatterplot
- Generating multiple plots on a grid
- Graphics parameters
- Creating plots with the lattice package
- Adding flair to your graphs
- Creating charts that facilitate comparisons
- Using base plotting system
- Creating&
- #160
- beanplots with the beanplot package
- Creating charts that help to visualize possible causality
- Chapter 3: Where Does It Belong? Classification
- Generating error/classification confusion matrices
- Visualizing the error/classification confusion matrix
- Comparing the model's performance for different classes
- Principal Component Analysis
- Generating receiver operating characteristic charts
- Using arbitrary class labels
- Building, plotting, and evaluating with classification trees
- Computing raw probabilities
- Creating the ROC chart
- Using random forest models for classification
- Generating the ROC chart
- Specifying cutoffs for classification
- Classifying using the support vector machine approach
- Controlling the scaling of variables
- Determining the type of SVM model
- Assigning weights to the classes
- Choosing the cost of SVM
- Tuning the SVM
- Classifying using the Naive Bayes approach
- Classifying using the KNN approach
- Automating the process of running KNN for many k values
- Selecting appropriate values of k using caret
- Using KNN to compute raw probabilities instead of classifications
- Using neural networks for classification
- Exercising greater control over nnet
- Generating raw probabilities and plotting the ROC curve
- Classifying using linear discriminant function analysis
- Using the formula interface for lda
- Classifying using logistic regression
- Text classification for sentiment analysis
- Chapter 4: Give Me a Number - Regression
- Computing the root-mean-square error
- Using a convenience function to compute the RMS error
- Building KNN models for regression
- Running KNN with cross-validation in place of a validation partition
- Using a convenience function to run KNN
- Using a convenience function to run KNN for multiple k values
- Performing linear regression
- Forcing lm to use a specific factor level as the reference
- Using other options in the formula expression for linear models
- Performing variable selection in linear regression
- Building regression trees
- Generating regression trees for data with categorical predictors
- Generating regression trees using the ensemble method - Bagging and Boosting
- Building random forest models for regression
- Controlling forest generation
- Using neural networks for regression
- Performing k-fold cross-validation
- Performing leave-one-out cross-validation to limit overfitting
- How to do it.
- How it works.
- Notes:
- Previous edition published: 2015.
- Includes bibliographical references at the end of each chapters and index.
- Description based on online resource; title from PDF title page (ebrary, viewed October 18, 2017).
- OCLC:
- 1006894355
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.