3 options

Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller.

EBSCOhost Academic eBook Collection (North America) Available online

Ebook Central College Complete Available online

O'Reilly Online Learning: Academic/Public Library Edition Available online

Format:: Book
Author/Creator:: Miller, James D. (Software consultant), author.
Language:: English
Subjects (All):: Statistics.; Big data.
Physical Description:: 1 online resource (1 volume) : illustrations
Edition:: 1st edition
Place of Publication:: Birmingham, UK : Packt Publishing, 2017.
System Details:: text file
Summary:: Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab...
Contents:: Cover; Copyright; Credits; About the Author; About the Reviewer; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: Transitioning from Data Developer to Data Scientist; Data developer thinking; Objectives of a data developer; Querying or mining; Data quality or data cleansing; Data modeling; Issue or insights; Thought process; Developer versus scientist; New data, new source; Quality questions; Querying and mining; Performance; Financial reporting; Visualizing; Tools of the trade; Advantages of thinking like a data scientist; Developing a better approach to understanding data; Using statistical thinking during program or database designing; Adding to your personal toolbox; Increased marketability; Perpetual learning; Seeing the future; Transitioning to a data scientist; Let's move ahead; Summary; Chapter 2: Declaring the Objectives; Key objectives of data science; Collecting data; Processing data; Exploring and visualizing data; Analyzing the data and/or applying machine learning to the data; Deciding (or planning) based upon acquired insight; Thinking like a data scientist; Bringing statistics into data science; Common terminology; Statistical population; Probability; False positives; Statistical inference; Regression; Fitting; Categorical data; Classification; Clustering; Statistical comparison; Coding; Distributions; Data mining; Decision trees; Machine learning; Munging and wrangling; Visualization; D3; Regularization; Assessment; Cross-validation; Neural networks; Boosting; Lift; Mode; Outlier; Predictive modeling; Big Data; Confidence interval; Writing; Chapter 3: A Developer's Approach to Data Cleaning; Understanding basic data cleaning.; Common data issues; Contextual data issues; Cleaning techniques; R and common data issues; Outliers; Step 1 - Profiling the data; Step 2 - Addressing the outliers; Domain expertise; Validity checking; Enhancing data; Harmonization; Standardization; Transformations; Deductive correction; Deterministic imputation; Chapter 4: Data Mining and the Database Developer; Common techniques; Cluster analysis; Correlation analysis; Discriminant analysis; Factor analysis; Regression analysis; Logistic analysis; Purpose; Mining versus querying; Choosing R for data mining; Visualizations; Current smokers; Missing values; A cluster analysis; Dimensional reduction; Calculating statistical significance; Frequent patterning; Frequent item-setting; Sequence mining; Chapter 5: Statistical Analysis for the Database Developer; Data analysis; Looking closer; Statistical analysis; Summarization; Comparing groups; Samples; Group comparison conclusions; Summarization modeling; Establishing the nature of data; Successful statistical analysis; R and statistical analysis; Chapter 6: Database Progression to Database Regression; Introducing statistical regression; Techniques and approaches for regression; Choosing your technique; Does it fit?; Identifying opportunities for statistical regression; Summarizing data; Exploring relationships; Testing significance of differences; Project profitability; R and statistical regression; A working example; Establishing the data profile; The graphical analysis; Predicting with our linear model; Step 1: Chunking the data; Step 2: Creating the model on the training data; Step 3: Predicting the projected profit on test data; Step 4: Reviewing the model.; Step 4: Accuracy and error; Chapter 7: Regularization for Database Improvement; Statistical regularization; Various statistical regularization methods; Ridge; Lasso; Least angles; Opportunities for regularization; Collinearity; Sparse solutions; High-dimensional data; Using data to understand statistical regularization; Improving data or a data model; Simplification; Relevance; Speed; Transformation; Variation of coefficients; Casual inference; Back to regularization; Reliability; Using R for statistical regularization; Parameter Setup; Chapter 8: Database Development and Assessment; Assessment and statistical assessment; Objectives; Baselines; Planning for assessment; Evaluation; Development versus assessment; Planning; Data assessment and data quality assurance; Categorizing quality; Preparing data; R and statistical assessment; Questions to ask; Learning curves; Example of a learning curve; Chapter 9: Databases and Neural Networks; Ask any data scientist; Defining neural network; Nodes; Layers; Training; Solution; Understanding the concepts; Neural network models and database models; No single or main node; Not serial; No memory address to store results; R-based neural networks; References; Data prep and preprocessing; Data splitting; Model parameters; R packages for ANN development; ANN; ANN2; NNET; Black boxes; A use case; Popular use cases; Character recognition; Image compression; Stock market prediction; Fraud detection; Neuroscience; Chapter 10: Boosting your Database; Definition and purpose; Bias; Categorizing bias; Causes of bias; Bias data collection; Bias sample selection.; Variance; ANOVA; Noise; Noisy data; Weak and strong learners; Weak to strong; Model bias; Training and prediction time; Complexity; Which way?; Back to boosting; How it started; AdaBoost; What you can learn from boosting (to help) your database; Using R to illustrate boosting methods; Prepping the data; Ready for boosting; Example results; Chapter 11: Database Classification using Support Vector Machines; Database classification; Data classification in statistics; Guidelines for classifying data; Common guidelines; Definitions; Definition and purpose of an SVM; The trick; Feature space and cheap computations; Drawing the line; More than classification; Downside; Reference resources; Predicting credit scores; Using R and an SVM to classify data in a database; Moving on; Chapter 12: Database Structures and Machine Learning; Data structures and data models; Data structures; Data models; What's the difference?; Relationships; Overview of machine learning concepts; Key elements of machine learning; Representation; Optimization; Types of machine learning; Supervised learning; Unsupervised learning; Semi-supervised learning; Reinforcement learning; Most popular; Applications of machine learning; Machine learning in practice; Understanding; Preparation; Learning; Interpretation; Deployment; Iteration; Using R to apply machine learning techniques to a database; Understanding the data; Preparing; Data developer; Understanding the challenge; Cross-tabbing and plotting; Index.
Notes:: Description based on online resource; title from title page (viewed January 2, 2018).
OCLC:: 1017754186

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

3 options

Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller.

Find

My Account

Guides