My Account Log in

1 option

Databricks ML in Action : Learn How Databricks Supports the Entire ML Lifecycle End to End from Data Ingestion to the Model Deployment / Stephanie Rivera [and three others].

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Rivera, Stephanie, author.
Language:
English
Subjects (All):
Machine learning.
Physical Description:
1 online resource (267 pages)
Edition:
First edition.
Place of Publication:
Birmingham : Packt Publishing Ltd., [2024]
Biography/History:
Rivera Stephanie: Stephanie Rivera has worked in big data and machine learning for 12 years. She collaborates with teams and companies as they design their Lakehouse as a Sr. Solutions Architect for Databricks. Previously Stephanie was the VP, Data Intelligence for a global company, taking in 20+ terabytes of data daily. She led the data science, data engineering, and business intelligence teams. Prokaieva Anastasia: Anastasia Prokaieva began her career 9 years ago as a research scientist at CEA (France), focusing on large data analysis and satellite data assimilation, treating terabytes of data. She has been working within the big data analysis and machine learning domain since then. In 2021, she joined Databricks and became the regional AI subject matter expert. On a daily basis, Anastasia consults Databricks users on best practices for implementing AI projects end-to-end. She also delivers training and workshops to democratize AI. Anastasia holds two MSc degrees in theoretical physics and energy science. Baker Amanda: Mandy Baker began her career in data 8 years ago. She loves leveraging her skills as a data scientist to orchestrate transformative journeys for companies across diverse industries as a Solutions Architect for Databricks. Her experiences have brought her from large corporations to small startups and everything in between. Mandy is a graduate of Carnegie Mellon University and the University of Washington. Horn Hayley: Hayley Horn started her data career 15 years ago as a data quality consultant on enterprise data integration projects. As a data scientist, she specialized in customer insights and strategy, and presented at Data Science and AI conferences in the US and Europe. She is currently a Sr. Solutions Architect for Databricks, with expertise in data science and technology modernization. A graduate of the MS Data Science program at Southern Methodist University in Dallas, Texas, USA, she is now a capstone advisor to students in their final semesters of the program.
Summary:
Get to grips with autogenerating code, deploying ML algorithms, and leveraging various ML lifecycle features on the Databricks Platform, guided by best practices and reusable code for you to try, alter, and build on Key Features Build machine learning solutions faster than peers only using documentation Enhance or refine your expertise with tribal knowledge and concise explanations Follow along with code projects provided in GitHub to accelerate your projects Purchase of the print or Kindle book includes a free PDF eBook Book Description Discover what makes the Databricks Data Intelligence Platform the go-to choice for top-tier machine learning solutions. Databricks ML in Action presents cloud-agnostic, end-to-end examples with hands-on illustrations of executing data science, machine learning, and generative AI projects on the Databricks Platform. You'll develop expertise in Databricks' managed MLflow, Vector Search, AutoML, Unity Catalog, and Model Serving as you learn to apply them practically in everyday workflows. This Databricks book not only offers detailed code explanations but also facilitates seamless code importation for practical use. You'll discover how to leverage the open-source Databricks platform to enhance learning, boost skills, and elevate productivity with supplemental resources. By the end of this book, you'll have mastered the use of Databricks for data science, machine learning, and generative AI, enabling you to deliver outstanding data products. What you will learn Set up a workspace for a data team planning to perform data science Monitor data quality and detect drift Use autogenerated code for ML modeling and data exploration Operationalize ML with feature engineering client, AutoML, VectorSearch, Delta Live Tables, AutoLoader, and Workflows Integrate open-source and third-party applications, such as OpenAI's ChatGPT, into your AI projects Communicate insights through Databricks SQL dashboards and Delta Sharing Explore data and models through the Databricks marketplace Who this book is for This book is for machine learning engineers, data scientists, and technical managers seeking hands-on expertise in implementing and leveraging the Databricks Data Intelligence Platform and its Lakehouse architecture to create data products.
Contents:
Cover
Title Page
Copyright and Credits
Dedication
Contributors
Table of Contents
Part 1: Overview of the Databricks Unified Lakehouse Platform
Chapter 1: Getting Started with This Book and Lakehouse Concepts
The components of the Data Intelligence Platform
The advantages of the Databricks Platform
Open source features
Databricks AutoML
Reusability and reproducibility
Open file formats give you flexibility
Applying our learning
Technical requirements
Getting to know your data
Project - streaming transactions
Project - Favorita sales forecasting
Project - multilabel image classification
Project - a retrieval augmented generation chatbot
Summary
Questions
Answers
Further reading
Chapter 2: Designing Databricks: Day One
Planning your platform
Defining a workspace
Selecting the metastore
Defining where the data lives, and cloud object storage
Discussing source control
Discussing data preparation
Planning to create features
Modeling in Databricks
Monitoring data and models
Setting up your workspace
Kaggle setup
Starting the projects
Project: Favorita store sales - time series forecasting
Project: Streaming Transactions
Project: Retrieval-Augmented Generation Chatbot
Project: Multilabel Image Classification
Chapter 3: Building Out Our Bronze Layer
Revisiting the Medallion architecture pattern
Transforming data to Delta with Auto Loader
Schema evolution
DLT, starting with Bronze
DLT benefits and features
Bronze data with DLT
Maintaining and optimizing Delta tables
VACUUM
Liquid clustering
OPTIMIZE
Predictive optimization
Technical requirements.
Project - streaming transactions
Project - Favorita store sales - time series forecasting
Part 2: Heavily Use Case-Focused
Chapter 4: Getting to Know Your Data
Improving data integrity with DLT
Monitoring data quality with Databricks Lakehouse Monitoring
Mechanics of Lakehouse Monitoring
Visualization and alerting
Creating a monitor
Exploring data with Databricks Assistant
Generating data profiles with AutoML
Using embeddings to understand unstructured data
Enhancing data retrieval with Databricks Vector Search
Flexibility in embedding model support
Setting up a vector search
Project - Favorita Store Sales - time-series forecasting
Project - RAG chatbot
Chapter 5: Feature Engineering on Databricks
Databricks Feature Engineering in Unity Catalog
Feature engineering on a stream
Employing point-in-time lookups for time series feature tables
Computing on-demand features
Publishing features to the Databricks Online Store
Project - Streaming Transactions
Project - Favorita Store Sales - time series forecasting
Chapter 6: Searching for a Signal
Baselining with AutoML
Tracking experiments with MLflow
Classifying beyond the basic
Integrating innovation
Parkinson's FOG
Forecasting Favorita sales
Further reading.
Chapter 7: Productionizing ML on Databricks
Deploying the MLOps inner loop
Registering a model
Collaborative development
Deploying the MLOps outer loop
Workflows
DABs
REST API
Deploying your model
Model Inference
Model serving
Project - Favorita Sales forecasting
Project - retrieval augmented generation chatbot
Chapter 8: Monitoring, Evaluating, and More
Monitoring your models
Building gold layer visualizations
Leveraging Lakeview dashboards
Visualizing big data with Databricks SQL dashboards
Python UDFs
Connecting your applications
Incorporating LLMs for analysts with SQL AI Functions
Project: Favorita store sales
Project -streaming transactions
Project: retrieval-augmented generation chatbot
Index
Other Books You May Enjoy.
Notes:
Description based on publisher supplied metadata and other sources.
Description based on print version record.
ISBN:
9781800564008
1800564007
OCLC:
1436070224

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account