1 option
Databricks ML in Action : Learn How Databricks Supports the Entire ML Lifecycle End to End from Data Ingestion to the Model Deployment / Stephanie Rivera [and three others].
- Format:
- Book
- Author/Creator:
- Rivera, Stephanie, author.
- Language:
- English
- Subjects (All):
- Machine learning.
- Physical Description:
- 1 online resource (267 pages)
- Edition:
- First edition.
- Place of Publication:
- Birmingham : Packt Publishing Ltd., [2024]
- Biography/History:
- Rivera Stephanie: Stephanie Rivera has worked in big data and machine learning for 12 years. She collaborates with teams and companies as they design their Lakehouse as a Sr. Solutions Architect for Databricks. Previously Stephanie was the VP, Data Intelligence for a global company, taking in 20+ terabytes of data daily. She led the data science, data engineering, and business intelligence teams. Prokaieva Anastasia: Anastasia Prokaieva began her career 9 years ago as a research scientist at CEA (France), focusing on large data analysis and satellite data assimilation, treating terabytes of data. She has been working within the big data analysis and machine learning domain since then. In 2021, she joined Databricks and became the regional AI subject matter expert. On a daily basis, Anastasia consults Databricks users on best practices for implementing AI projects end-to-end. She also delivers training and workshops to democratize AI. Anastasia holds two MSc degrees in theoretical physics and energy science. Baker Amanda: Mandy Baker began her career in data 8 years ago. She loves leveraging her skills as a data scientist to orchestrate transformative journeys for companies across diverse industries as a Solutions Architect for Databricks. Her experiences have brought her from large corporations to small startups and everything in between. Mandy is a graduate of Carnegie Mellon University and the University of Washington. Horn Hayley: Hayley Horn started her data career 15 years ago as a data quality consultant on enterprise data integration projects. As a data scientist, she specialized in customer insights and strategy, and presented at Data Science and AI conferences in the US and Europe. She is currently a Sr. Solutions Architect for Databricks, with expertise in data science and technology modernization. A graduate of the MS Data Science program at Southern Methodist University in Dallas, Texas, USA, she is now a capstone advisor to students in their final semesters of the program.
- Summary:
- Get to grips with autogenerating code, deploying ML algorithms, and leveraging various ML lifecycle features on the Databricks Platform, guided by best practices and reusable code for you to try, alter, and build on Key Features Build machine learning solutions faster than peers only using documentation Enhance or refine your expertise with tribal knowledge and concise explanations Follow along with code projects provided in GitHub to accelerate your projects Purchase of the print or Kindle book includes a free PDF eBook Book Description Discover what makes the Databricks Data Intelligence Platform the go-to choice for top-tier machine learning solutions. Databricks ML in Action presents cloud-agnostic, end-to-end examples with hands-on illustrations of executing data science, machine learning, and generative AI projects on the Databricks Platform. You'll develop expertise in Databricks' managed MLflow, Vector Search, AutoML, Unity Catalog, and Model Serving as you learn to apply them practically in everyday workflows. This Databricks book not only offers detailed code explanations but also facilitates seamless code importation for practical use. You'll discover how to leverage the open-source Databricks platform to enhance learning, boost skills, and elevate productivity with supplemental resources. By the end of this book, you'll have mastered the use of Databricks for data science, machine learning, and generative AI, enabling you to deliver outstanding data products. What you will learn Set up a workspace for a data team planning to perform data science Monitor data quality and detect drift Use autogenerated code for ML modeling and data exploration Operationalize ML with feature engineering client, AutoML, VectorSearch, Delta Live Tables, AutoLoader, and Workflows Integrate open-source and third-party applications, such as OpenAI's ChatGPT, into your AI projects Communicate insights through Databricks SQL dashboards and Delta Sharing Explore data and models through the Databricks marketplace Who this book is for This book is for machine learning engineers, data scientists, and technical managers seeking hands-on expertise in implementing and leveraging the Databricks Data Intelligence Platform and its Lakehouse architecture to create data products.
- Contents:
- Cover
- Title Page
- Copyright and Credits
- Dedication
- Contributors
- Table of Contents
- Part 1: Overview of the Databricks Unified Lakehouse Platform
- Chapter 1: Getting Started with This Book and Lakehouse Concepts
- The components of the Data Intelligence Platform
- The advantages of the Databricks Platform
- Open source features
- Databricks AutoML
- Reusability and reproducibility
- Open file formats give you flexibility
- Applying our learning
- Technical requirements
- Getting to know your data
- Project - streaming transactions
- Project - Favorita sales forecasting
- Project - multilabel image classification
- Project - a retrieval augmented generation chatbot
- Summary
- Questions
- Answers
- Further reading
- Chapter 2: Designing Databricks: Day One
- Planning your platform
- Defining a workspace
- Selecting the metastore
- Defining where the data lives, and cloud object storage
- Discussing source control
- Discussing data preparation
- Planning to create features
- Modeling in Databricks
- Monitoring data and models
- Setting up your workspace
- Kaggle setup
- Starting the projects
- Project: Favorita store sales - time series forecasting
- Project: Streaming Transactions
- Project: Retrieval-Augmented Generation Chatbot
- Project: Multilabel Image Classification
- Chapter 3: Building Out Our Bronze Layer
- Revisiting the Medallion architecture pattern
- Transforming data to Delta with Auto Loader
- Schema evolution
- DLT, starting with Bronze
- DLT benefits and features
- Bronze data with DLT
- Maintaining and optimizing Delta tables
- VACUUM
- Liquid clustering
- OPTIMIZE
- Predictive optimization
- Technical requirements.
- Project - streaming transactions
- Project - Favorita store sales - time series forecasting
- Part 2: Heavily Use Case-Focused
- Chapter 4: Getting to Know Your Data
- Improving data integrity with DLT
- Monitoring data quality with Databricks Lakehouse Monitoring
- Mechanics of Lakehouse Monitoring
- Visualization and alerting
- Creating a monitor
- Exploring data with Databricks Assistant
- Generating data profiles with AutoML
- Using embeddings to understand unstructured data
- Enhancing data retrieval with Databricks Vector Search
- Flexibility in embedding model support
- Setting up a vector search
- Project - Favorita Store Sales - time-series forecasting
- Project - RAG chatbot
- Chapter 5: Feature Engineering on Databricks
- Databricks Feature Engineering in Unity Catalog
- Feature engineering on a stream
- Employing point-in-time lookups for time series feature tables
- Computing on-demand features
- Publishing features to the Databricks Online Store
- Project - Streaming Transactions
- Project - Favorita Store Sales - time series forecasting
- Chapter 6: Searching for a Signal
- Baselining with AutoML
- Tracking experiments with MLflow
- Classifying beyond the basic
- Integrating innovation
- Parkinson's FOG
- Forecasting Favorita sales
- Further reading.
- Chapter 7: Productionizing ML on Databricks
- Deploying the MLOps inner loop
- Registering a model
- Collaborative development
- Deploying the MLOps outer loop
- Workflows
- DABs
- REST API
- Deploying your model
- Model Inference
- Model serving
- Project - Favorita Sales forecasting
- Project - retrieval augmented generation chatbot
- Chapter 8: Monitoring, Evaluating, and More
- Monitoring your models
- Building gold layer visualizations
- Leveraging Lakeview dashboards
- Visualizing big data with Databricks SQL dashboards
- Python UDFs
- Connecting your applications
- Incorporating LLMs for analysts with SQL AI Functions
- Project: Favorita store sales
- Project -streaming transactions
- Project: retrieval-augmented generation chatbot
- Index
- Other Books You May Enjoy.
- Notes:
- Description based on publisher supplied metadata and other sources.
- Description based on print version record.
- ISBN:
- 9781800564008
- 1800564007
- OCLC:
- 1436070224
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.