1 option

Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / Frank Kane.

Ebook Central College Complete Available online

Format:: Book
Author/Creator:: Kane, Frank, author.
Language:: English
Subjects (All):: Python (Computer program language).; Machine learning.; Data mining.
Physical Description:: 1 online resource (415 pages) : illustrations
Edition:: 1st ed.
Place of Publication:: Birmingham, England ; Mumbai, [India] : Packt, 2017.
Biography/History:: Kane Frank: Frank Kane has spent nine years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers all the time. He holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology and teaches others about big data analysis.
Summary:: This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark.Key FeaturesTake your first steps in the world of data science by understanding the tools and techniques of data analysisTrain efficient Machine Learning models in Python using the supervised and unsupervised learning methodsLearn how to use Apache Spark for processing Big Data efficientlyBook DescriptionJoin Frank Kane, who worked on Amazon and IMDb’s machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them. Based on Frank’s successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis. What you will learnLearn how to clean your data and ready it for analysisImplement the popular clustering and regression methods in PythonTrain efficient machine learning models using decision trees and random forestsVisualize the results of your analysis using Python’s Matplotlib libraryUse Apache Spark’s MLlib package to perform machine learning on large datasetsWho this book is forIf you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful, but you don't need to be an expert Python coder or mathematician to get the most from this book.
Contents:: Intro; Copyright; Credits; About the Author; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: Getting Started; Installing Enthought Canopy; Giving the installation a test run; If you occasionally get problems opening your IPNYB files; Using and understanding IPython (Jupyter) Notebooks; Python basics - Part 1; Understanding Python code; Importing modules; Data structures; Experimenting with lists; Pre colon; Post colon; Negative syntax; Adding list to list; The append function; Complex data structures; Dereferencing a single element; The sort function; Reverse sort; Tuples; Dereferencing an element; List of tuples; Dictionaries; Iterating through entries; Python basics - Part 2; Functions in Python; Lambda functions - functional programming; Understanding boolean expressions; The if statement; The if-else loop; Looping; The while loop; Exploring activity; Running Python scripts; More options than just the IPython/Jupyter Notebook; Running Python scripts in command prompt; Using the Canopy IDE; Summary; Chapter 2: Statistics and Probability Refresher, and Python Practice; Types of data; Numerical data; Discrete data; Continuous data; Categorical data; Ordinal data; Mean, median, and mode; Mean; Median; The factor of outliers; Mode; Using mean, median, and mode in Python; Calculating mean using the NumPy package; Visualizing data using matplotlib; Calculating median using the NumPy package; Analyzing the effect of outliers; Calculating mode using the SciPy package; Some exercises; Standard deviation and variance; Variance; Measuring variance; Standard deviation; Identifying outliers with standard deviation; Population variance versus sample variance; The Mathematical explanation.; Analyzing standard deviation and variance on a histogram; Using Python to compute standard deviation and variance; Try it yourself; Probability density function and probability mass function; The probability density function and probability mass functions; Probability density functions; Probability mass functions; Types of data distributions; Uniform distribution; Normal or Gaussian distribution; The exponential probability distribution or Power law; Binomial probability mass function; Poisson probability mass function; Percentiles and moments; Percentiles; Quartiles; Computing percentiles in Python; Moments; Computing moments in Python; Chapter 3: Matplotlib and Advanced Probability Concepts; A crash course in Matplotlib; Generating multiple plots on one graph; Saving graphs as images; Adjusting the axes; Adding a grid; Changing line types and colors; Labeling axes and adding a legend; A fun example; Generating pie charts; Generating bar charts; Generating scatter plots; Generating histograms; Generating box-and-whisker plots; Covariance and correlation; Defining the concepts; Measuring covariance; Correlation; Computing covariance and correlation in Python; Computing correlation - The hard way; Computing correlation - The NumPy way; Correlation activity; Conditional probability; Conditional probability exercises in Python; Conditional probability assignment; My assignment solution; Bayes' theorem; Chapter 4: Predictive Models; Linear regression; The ordinary least squares technique; The gradient descent technique; The co-efficient of determination or r-squared; Computing r-squared; Interpreting r-squared; Computing linear regression and r-squared using Python; Activity for linear regression.; Polynomial regression; Implementing polynomial regression using NumPy; Computing the r-squared error; Activity for polynomial regression; Multivariate regression and predicting car prices; Multivariate regression using Python; Activity for multivariate regression; Multi-level models; Chapter 5: Machine Learning with Python; Machine learning and train/test; Unsupervised learning; Supervised learning; Evaluating supervised learning; K-fold cross validation; Using train/test to prevent overfitting of a polynomial regression; Activity; Bayesian methods - Concepts; Implementing a spam classifier with Naïve Bayes; K-Means clustering; Limitations to k-means clustering; Clustering people based on income and age; Measuring entropy; Decision trees - Concepts; Decision tree example; Walking through a decision tree; Random forests technique; Decision trees - Predicting hiring decisions using Python; Ensemble learning - Using a random forest; Ensemble learning; Support vector machine overview; Using SVM to cluster people by using scikit-learn; Chapter 6: Recommender Systems; What are recommender systems?; User-based collaborative filtering; Limitations of user-based collaborative filtering; Item-based collaborative filtering; Understanding item-based collaborative filtering; How item-based collaborative filtering works?; Collaborative filtering using Python; Finding movie similarities; Understanding the code; The corrwith function; Improving the results of movie similarities; Making movie recommendations to people; Understanding movie recommendations with an example; Using the groupby command to combine rows; Removing entries with the drop command; Improving the recommendation results; Summary.; Chapter 7: More Data Mining and Machine Learning Techniques; K-nearest neighbors - concepts; Using KNN to predict a rating for a movie; Dimensionality reduction and principal component analysis; Dimensionality reduction; Principal component analysis; A PCA example with the Iris dataset; Data warehousing overview; ETL versus ELT; Reinforcement learning; Q-learning; The exploration problem; The simple approach; The better way; Fancy words; Markov decision process; Dynamic programming; Chapter 8: Dealing with Real-World Data; Bias/variance trade-off; K-fold cross-validation to avoid overfitting; Example of k-fold cross-validation using scikit-learn; Data cleaning and normalisation; Cleaning web log data; Applying a regular expression on the web log; Modification one - filtering the request field; Modification two - filtering post requests; Modification three - checking the user agents; Filtering the activity of spiders/robots; Modification four - applying website-specific filters; Activity for web log data; Normalizing numerical data; Detecting outliers; Dealing with outliers; Activity for outliers; Chapter 9: Apache Spark - Machine Learning on Big Data; Installing Spark; Installing Spark on Windows; Installing Spark on other operating systems; Installing the Java Development Kit; Spark introduction; It's scalable; It's fast; It's young; It's not difficult; Components of Spark; Python versus Scala for Spark; Spark and Resilient Distributed Datasets (RDD); The SparkContext object; Creating RDDs; Creating an RDD using a Python list; Loading an RDD from a text file; More ways to create RDDs; RDD operations; Transformations; Using map(); Actions; Introducing MLlib.; Some MLlib Capabilities; Special MLlib data types; The vector data type; LabeledPoint data type; Rating data type; Decision Trees in Spark with MLlib; Exploring decision trees code; Creating the SparkContext; Importing and cleaning our data; Creating a test candidate and building our decision tree; Running the script; K-Means Clustering in Spark; Within set sum of squared errors (WSSSE); Running the code; TF-IDF; TF-IDF in practice; Using TF- IDF; Searching wikipedia with Spark MLlib; Import statements; Creating the initial RDD; Creating and transforming a HashingTF object; Computing the TF-IDF score; Using the Wikipedia search engine algorithm; Running the algorithm; Using the Spark 2.0 DataFrame API for MLlib; How Spark 2.0 MLlib works; Implementing linear regression; Chapter 10: Testing and Experimental Design; A/B testing concepts; A/B tests; Measuring conversion for A/B testing; How to attribute conversions; Variance is your enemy; T-test and p-value; The t-statistic or t-test; The p-value; Measuring t-statistics and p-values using Python; Running A/B test on some experimental data; When there's no real difference between the two groups; Does the sample size make a difference?; Sample size increased to six-digits; Sample size increased seven-digits; A/A testing; Determining how long to run an experiment for; A/B test gotchas; Novelty effects; Seasonal effects; Selection bias; Auditing selection bias issues; Data pollution; Attribution errors; Index.
Notes:: Includes index.; Description based on online resource; title from PDF title page (ebrary, viewed August 28, 2017).
OCLC:: 999636604

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / Frank Kane.

Find

My Account

Guides