1 option

Data science solutions with Python : fast and scalable models using Keras, Pyspark Mllib, H2O, XGBoost, and scikit-Learn / Tshepo Chris Nokeri.

O'Reilly Online Learning: Academic/Public Library Edition Available online

Format:: Book
Author/Creator:: Nokeri, Tshepo Chris, author.
Language:: English
Subjects (All):: Machine learning.; Python (Computer program language).
Physical Description:: 1 online resource (128 pages)
Place of Publication:: [Place of publication not identified] : Apress, [2022]
Summary:: Apply supervised and unsupervised learning to solve practical and real-world big data problems. This book teaches you how to engineer features, optimize hyperparameters, train and test models, develop pipelines, and automate the machine learning (ML) process. The book covers an in-memory, distributed cluster computing framework known as PySpark, machine learning framework platforms known as scikit-learn, PySpark MLlib, H2O, and XGBoost, and a deep learning (DL) framework known as Keras. The book starts off presenting supervised and unsupervised ML and DL models, and then it examines big data frameworks along with ML and DL frameworks. Author Tshepo Chris Nokeri considers a parametric model known as the Generalized Linear Model and a survival regression model known as the Cox Proportional Hazards model along with Accelerated Failure Time (AFT). Also presented is a binary classification model (logistic regression) and an ensemble model (Gradient Boosted Trees). The book introduces DL and an artificial neural network known as the Multilayer Perceptron (MLP) classifier. A way of performing cluster analysis using the K-Means model is covered. Dimension reduction techniques such as Principal Components Analysis and Linear Discriminant Analysis are explored. And automated machine learning is unpacked. This book is for intermediate-level data scientists and machine learning engineers who want to learn how to apply key big data frameworks and ML and DL frameworks. You will need prior knowledge of the basics of statistics, Python programming, probability theories, and predictive analytics. What You Will Learn Understand widespread supervised and unsupervised learning, including key dimension reduction techniques Know the big data analytics layers such as data visualization, advanced statistics, predictive analytics, machine learning, and deep learning Integrate big data frameworks with a hybrid of machine learning frameworks and deep learning frameworks Design, build, test, and validate skilled machine models and deep learning models Optimize model performance using data transformation, regularization, outlier remedying, hyperparameter optimization, and data split ratio alteration.
Contents:: Intro; Table of Contents; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Chapter 1: Exploring Machine Learning; Exploring Supervised Methods; Exploring Nonlinear Models; Exploring Ensemble Methods; Exploring Unsupervised Methods; Exploring Cluster Methods; Exploring Dimension Reduction; Exploring Deep Learning; Conclusion; Chapter 2: Big Data, Machine Learning, and Deep Learning Frameworks; Big Data; Big Data Features; Impact of Big Data on Business and People; Better Customer Relationships; Refined Product Development; Improved Decision-Making; Big Data Warehousing; Big Data ETL; Big Data Frameworks; Apache Spark; Resilient Distributed Data Sets; Spark Configuration; Spark Frameworks; SparkSQL; Spark Streaming; Spark MLlib; GraphX; ML Frameworks; Scikit-Learn; H2O; XGBoost; DL Frameworks; Keras; Chapter 3: Linear Modeling with Scikit-Learn, PySpark, and H2O; Exploring the Ordinary Least-Squares Method; Scikit-Learn in Action; PySpark in Action; H2O in Action; Chapter 4: Survival Analysis with PySpark and Lifelines; Exploring Survival Analysis; Exploring Cox Proportional Hazards Method; Lifeline in Action; Exploring the Accelerated Failure Time Method; Chapter 5: Nonlinear Modeling With Scikit-Learn, PySpark, and H2O; Exploring the Logistic Regression Method; Chapter 6: Tree Modeling and Gradient Boosting with Scikit-Learn, XGBoost, PySpark, and H2O; Decision Trees; Preprocessing Features; Gradient Boosting; XGBoost in Action; Chapter 7: Neural Networks with Scikit-Learn, Keras, and H2O.; Exploring Deep Learning; Multilayer Perceptron Neural Network; Keras in Action; Deep Belief Networks; Chapter 8: Cluster Analysis with Scikit-Learn, PySpark, and H2O; Exploring the K-Means Method; Chapter 9: Principal Component Analysis with Scikit-Learn, PySpark, and H2O; Exploring the Principal Component Method; Chapter 10: Automating the Machine Learning Process with H2O; Exploring Automated Machine Learning; H2O AutoML in Action; Index.
Notes:: Description based on print version record.; Includes index.
ISBN:: 9781523150984; 152315098X; 9781484277621; 1484277627
OCLC:: 1283854399

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

Data science solutions with Python : fast and scalable models using Keras, Pyspark Mllib, H2O, XGBoost, and scikit-Learn / Tshepo Chris Nokeri.

Find

My Account

Guides