My Account Log in

3 options

Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller.

EBSCOhost Academic eBook Collection (North America) Available online

View online

Ebook Central College Complete Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Miller, James D. (Software consultant), author.
Language:
English
Subjects (All):
Statistics.
Big data.
Physical Description:
1 online resource (1 volume) : illustrations
Edition:
1st edition
Place of Publication:
Birmingham, UK : Packt Publishing, 2017.
System Details:
text file
Summary:
Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab...
Contents:
Cover
Copyright
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Transitioning from Data Developer to Data Scientist
Data developer thinking
Objectives of a data developer
Querying or mining
Data quality or data cleansing
Data modeling
Issue or insights
Thought process
Developer versus scientist
New data, new source
Quality questions
Querying and mining
Performance
Financial reporting
Visualizing
Tools of the trade
Advantages of thinking like a data scientist
Developing a better approach to understanding data
Using statistical thinking during program or database designing
Adding to your personal toolbox
Increased marketability
Perpetual learning
Seeing the future
Transitioning to a data scientist
Let's move ahead
Summary
Chapter 2: Declaring the Objectives
Key objectives of data science
Collecting data
Processing data
Exploring and visualizing data
Analyzing the data and/or applying machine learning to the data
Deciding (or planning) based upon acquired insight
Thinking like a data scientist
Bringing statistics into data science
Common terminology
Statistical population
Probability
False positives
Statistical inference
Regression
Fitting
Categorical data
Classification
Clustering
Statistical comparison
Coding
Distributions
Data mining
Decision trees
Machine learning
Munging and wrangling
Visualization
D3
Regularization
Assessment
Cross-validation
Neural networks
Boosting
Lift
Mode
Outlier
Predictive modeling
Big Data
Confidence interval
Writing
Chapter 3: A Developer's Approach to Data Cleaning
Understanding basic data cleaning.
Common data issues
Contextual data issues
Cleaning techniques
R and common data issues
Outliers
Step 1 - Profiling the data
Step 2 - Addressing the outliers
Domain expertise
Validity checking
Enhancing data
Harmonization
Standardization
Transformations
Deductive correction
Deterministic imputation
Chapter 4: Data Mining and the Database Developer
Common techniques
Cluster analysis
Correlation analysis
Discriminant analysis
Factor analysis
Regression analysis
Logistic analysis
Purpose
Mining versus querying
Choosing R for data mining
Visualizations
Current smokers
Missing values
A cluster analysis
Dimensional reduction
Calculating statistical significance
Frequent patterning
Frequent item-setting
Sequence mining
Chapter 5: Statistical Analysis for the Database Developer
Data analysis
Looking closer
Statistical analysis
Summarization
Comparing groups
Samples
Group comparison conclusions
Summarization modeling
Establishing the nature of data
Successful statistical analysis
R and statistical analysis
Chapter 6: Database Progression to Database Regression
Introducing statistical regression
Techniques and approaches for regression
Choosing your technique
Does it fit?
Identifying opportunities for statistical regression
Summarizing data
Exploring relationships
Testing significance of differences
Project profitability
R and statistical regression
A working example
Establishing the data profile
The graphical analysis
Predicting with our linear model
Step 1: Chunking the data
Step 2: Creating the model on the training data
Step 3: Predicting the projected profit on test data
Step 4: Reviewing the model.
Step 4: Accuracy and error
Chapter 7: Regularization for Database Improvement
Statistical regularization
Various statistical regularization methods
Ridge
Lasso
Least angles
Opportunities for regularization
Collinearity
Sparse solutions
High-dimensional data
Using data to understand statistical regularization
Improving data or a data model
Simplification
Relevance
Speed
Transformation
Variation of coefficients
Casual inference
Back to regularization
Reliability
Using R for statistical regularization
Parameter Setup
Chapter 8: Database Development and Assessment
Assessment and statistical assessment
Objectives
Baselines
Planning for assessment
Evaluation
Development versus assessment
Planning
Data assessment and data quality assurance
Categorizing quality
Preparing data
R and statistical assessment
Questions to ask
Learning curves
Example of a learning curve
Chapter 9: Databases and Neural Networks
Ask any data scientist
Defining neural network
Nodes
Layers
Training
Solution
Understanding the concepts
Neural network models and database models
No single or main node
Not serial
No memory address to store results
R-based neural networks
References
Data prep and preprocessing
Data splitting
Model parameters
R packages for ANN development
ANN
ANN2
NNET
Black boxes
A use case
Popular use cases
Character recognition
Image compression
Stock market prediction
Fraud detection
Neuroscience
Chapter 10: Boosting your Database
Definition and purpose
Bias
Categorizing bias
Causes of bias
Bias data collection
Bias sample selection.
Variance
ANOVA
Noise
Noisy data
Weak and strong learners
Weak to strong
Model bias
Training and prediction time
Complexity
Which way?
Back to boosting
How it started
AdaBoost
What you can learn from boosting (to help) your database
Using R to illustrate boosting methods
Prepping the data
Ready for boosting
Example results
Chapter 11: Database Classification using Support Vector Machines
Database classification
Data classification in statistics
Guidelines for classifying data
Common guidelines
Definitions
Definition and purpose of an SVM
The trick
Feature space and cheap computations
Drawing the line
More than classification
Downside
Reference resources
Predicting credit scores
Using R and an SVM to classify data in a database
Moving on
Chapter 12: Database Structures and Machine Learning
Data structures and data models
Data structures
Data models
What's the difference?
Relationships
Overview of machine learning concepts
Key elements of machine learning
Representation
Optimization
Types of machine learning
Supervised learning
Unsupervised learning
Semi-supervised learning
Reinforcement learning
Most popular
Applications of machine learning
Machine learning in practice
Understanding
Preparation
Learning
Interpretation
Deployment
Iteration
Using R to apply machine learning techniques to a database
Understanding the data
Preparing
Data developer
Understanding the challenge
Cross-tabbing and plotting
Index.
Notes:
Description based on online resource; title from title page (viewed January 2, 2018).
OCLC:
1017754186

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account