My Account Log in

1 option

Data science solutions with Python : fast and scalable models using Keras, Pyspark Mllib, H2O, XGBoost, and scikit-Learn / Tshepo Chris Nokeri.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Nokeri, Tshepo Chris, author.
Language:
English
Subjects (All):
Machine learning.
Python (Computer program language).
Physical Description:
1 online resource (128 pages)
Place of Publication:
[Place of publication not identified] : Apress, [2022]
Summary:
Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked. This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics. What You Will Learn Understand widespread supervised and unsupervised learning, including key dimension reduction techniques Know the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learning Integrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworks Design, build, test, and validate skilled machine models and deep learning models Optimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration.
Contents:
Intro
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: Exploring Machine Learning
Exploring Supervised Methods
Exploring Nonlinear Models
Exploring Ensemble Methods
Exploring Unsupervised Methods
Exploring Cluster Methods
Exploring Dimension Reduction
Exploring Deep Learning
Conclusion
Chapter 2: Big Data, Machine Learning, and Deep Learning Frameworks
Big Data
Big Data Features
Impact of Big Data on Business and People
Better Customer Relationships
Refined Product Development
Improved Decision-Making
Big Data Warehousing
Big Data ETL
Big Data Frameworks
Apache Spark
Resilient Distributed Data Sets
Spark Configuration
Spark Frameworks
SparkSQL
Spark Streaming
Spark MLlib
GraphX
ML Frameworks
Scikit-Learn
H2O
XGBoost
DL Frameworks
Keras
Chapter 3: Linear Modeling with Scikit-Learn, PySpark, and H2O
Exploring the Ordinary Least-Squares Method
Scikit-Learn in Action
PySpark in Action
H2O in Action
Chapter 4: Survival Analysis with PySpark and Lifelines
Exploring Survival Analysis
Exploring Cox Proportional Hazards Method
Lifeline in Action
Exploring the Accelerated Failure Time Method
Chapter 5: Nonlinear Modeling With Scikit-Learn, PySpark, and H2O
Exploring the Logistic Regression Method
Chapter 6: Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O
Decision Trees
Preprocessing Features
Gradient Boosting
XGBoost in Action
Chapter 7: Neural Networks with Scikit-Learn, Keras, and H2O.
Exploring Deep Learning
Multilayer Perceptron Neural Network
Keras in Action
Deep Belief Networks
Chapter 8: Cluster Analysis with Scikit-Learn, PySpark, and H2O
Exploring the K-Means Method
Chapter 9: Principal Component Analysis with Scikit-Learn, PySpark, and H2O
Exploring the Principal Component Method
Chapter 10: Automating the Machine Learning Process with H2O
Exploring Automated Machine Learning
H2O AutoML in Action
Index.
Notes:
Description based on print version record.
Includes index.
ISBN:
9781523150984
152315098X
9781484277621
1484277627
OCLC:
1283854399

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Library Catalog Using Articles+ Library Account