My Account Log in

1 option

Beginning Azure synapse analytics : transition from data warehouse to data lakehouse / Bhadresh Shiyal.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Shiyal, Bhadresh, author.
Language:
English
Subjects (All):
Data warehousing--Management.
Data warehousing.
Microsoft Azure (Computing platform).
Physical Description:
1 online resource (263 pages)
Place of Publication:
[Place of publication not identified] : Apress, [2021]
Summary:
Get started with Azure Synapse Analytics, Microsoft's modern data analytics platform. This book covers core components such as Synapse SQL, Synapse Spark, Synapse Pipelines, and many more, along with their architecture and implementation. The book begins with an introduction to core data and analytics concepts followed by an understanding of traditional/legacy data warehouse, modern data warehouse, and the most modern data lakehouse. You will go through the introduction and background of Azure Synapse Analytics along with its main features and key service capabilities. Core architecture is discussed, along with Synapse SQL. You will learn its main features and how to create a dedicated Synapse SQL pool and analyze your big data using Serverless Synapse SQL Pool. You also will learn Synapse Spark and Synapse Pipelines, with examples. And you will learn Synapse Workspace and Synapse Studio followed by Synapse Link and its features. You will go through use cases in Azure Synapse and understand the reference architecture for Synapse Analytics. After reading this book, you will be able to work withAzure Synapse Analytics and understand its architecture, main components, features, and capabilities. What You Will Learn * Understand core data and analytics concepts and data lakehouse concepts * Be familiar with overall Azure Synapse architecture and its main components * Be familiar with Synapse SQL and Synapse Spark architecture components * Work with integrated Apache Spark (aka Synapse Spark) and Synapse SQL engines * Understand Synapse Workspace, Synapse Studio, and Synapse Pipeline * Study reference architecture and use cases Who This Book Is For Azure data analysts, data engineers, data scientists, and solutions architects
Contents:
Intro
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: Core Data and Analytics Concepts
Core Data Concepts
What Is Data?
Structured Data
Semi-structured Data
Unstructured Data
Data Processing Methods
Batch Data Processing
Streaming or Real-Time Data Processing
Relational Data and Its Characteristics
Non-Relational Data and Its Characteristics
Core Data Analytics Concepts
What Is Data Analytics?
Data Ingestion
Data Exploration
Data Processing
ETL
ELT
ELT / ETL Tools
Data Visualization
Data Analytics Categories
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
Cognitive Analytics
Summary
Chapter 2: Modern Data Warehouses and Data Lakehouses
What Is a Data Warehouse?
Core Data Warehouse Concepts
Data Model
Model Types
Schema Types
Metadata
Why Do We Need a Data Warehouse?
Efficient Decision-Making
Separation of Concerns
Single Version of the Truth
Data Restructuring
Self-Service BI
Historical Data
Security
Data Quality
Data Mining
More Revenues
What Is a Modern Data Warehouse?
Difference Between Traditional &amp
Modern Data Warehouses
Cloud vs. On-Premises
Separation of Compute and Storage Resources
Cost
Scalability
ETL vs. ELT
Disaster Recovery
Overall Architecture
Data Lakehouse
What Is a Data Lake?
What Is Delta Lake?
What Is Apache Spark?
What Is a Data Lakehouse?
Characteristics of a Data Lakehouse
Various Data Types
AI
Decoupled Compute and Storage Resources
Open Source Storage Format
Data Analytics and BI Tools
ACID Properties
Differences Between a Data Warehouse and a Data Lakehouse
Architecture
Access to Raw Data.
Open Source vs. Proprietary
Workloads
Query Engines
Real-Time Data
Examples of Data Lakehouses
Azure Synapse Analytics
Databricks
Benefits of Data Lakehouse
Support for All Types of Data
Time to Market
More Cost Effective
Reduction in ETL/ELT Jobs
Usage of Open Source Tools and Technologies
Efficient and Easy Data Governance
Drawbacks of Data Lakehouse
Monolithic Architecture
Technical Infancy
Migration Cost
Lack of Many Products/Options
Scarcity of Skilled Technical Resources
Chapter 3: Introduction to Azure Synapse Analytics
What Is Azure Synapse Analytics?
Azure Synapse Analytics vs. Azure SQL Data Warehouse
Why Should You Learn Azure Synapse Analytics?
Main Features of Azure Synapse Analytics
Unified Data Analytics Experience
Powerful Data Insights
Unlimited Scale
Security, Privacy, and Compliance
HTAP
Key Service Capabilities of Azure Synapse Analytics
Data Lake Exploration
Multiple Language Support
Deeply Integrated Apache Spark
Serverless Synapse SQL Pool
Hybrid Data Integration
Power BI Integration
AI Integration
Enterprise Data Warehousing
Seamless Streaming Analytics
Workload Management
Advanced Security
Chapter 4: Architecture and Its Main Components
High-Level Architecture
Main Components of Architecture
Synapse SQL
Compute Layer
Dedicated Synapse SQL Pool
Storage Layer
Synapse Spark or Apache Spark
Synapse Pipelines
Synapse Studio
Synapse Link
Chapter 5: Synapse SQL
Synapse SQL Architecture Components
Massively Parallel Processing Engine
Distributed Query Processing Engine
Control Node
Compute Nodes
Data Movement Service
Distribution
Hash Distribution.
Round-Robin Distribution
Replication-based Distribution
Azure Storage
Dedicated or Provisioned Synapse SQL Pool
Serverless or On-Demand Synapse SQL Pool
Synapse SQL Feature Comparison
Database Object Types
Query Language
Tools
Storage Options
Data Formats
Resource Consumption Model for Synapse SQL
Synapse SQL Best Practices
Best Practices for Serverless Synapse SQL Pool
Best Practices for Dedicated Synapse SQL Pool
How-To's
Create a Dedicated Synapse SQL Pool
Create a Serverless or On-Demand Synapse SQL Pool
Load Data Using COPY Statement in Dedicated Synapse SQL Pool
Ingest Data into Azure Data Lake Storage Gen2
Chapter 6: Synapse Spark
What Is Synapse Spark in Azure Synapse Analytics?
Synapse Spark Features &amp
Capabilities
Speed
Faster Start Time
Ease of Creation
Ease of Use
Automatic Scalability
Integration with IDEs
Pre-loaded Libraries
REST APIs
Delta Lake and Its Importance in Synapse Spark
Synapse Spark Job Optimization
Data Format
Memory Management
Data Serialization
Data Caching
Data Abstraction
Join and Shuffle Optimization
Bucketing
Hyperspace Indexing
Synapse Spark Machine Learning
Data Preparation and Exploration
Build Machine Learning Models
Train Machine Learning Models
Model Deployment and Scoring
How to Create a Synapse Spark Pool
How to Create and Submit Apache Spark Job Definition in Synapse Studio Using Python
How to Monitor Synapse Spark Pools Using Synapse Studio
Chapter 7: Synapse Pipelines
Overview of Azure Data Factory
Overview of Synapse Pipelines
Activities
Pipelines
Linked Services
Dataset
Integration Runtimes (IR).
Azure Integration Runtime (Azure IR)
Self-Hosted Integration Runtimes (SHIR)
Azure SSIS Integration Runtimes (Azure SSIS IR)
Control Flow
Parameters
Data Flow
Data Movement Activities
Category: Azure
Category: Database
Category: NoSQL
Category: File
Category: Generic
Category: Services and Applications
Data Transformation Activities
Control Flow Activities
Copy Pipeline Example
Transformation Pipeline Example
Pipeline Triggers
Chapter 8: Synapse Workspace and Studio
What Is a Synapse Analytics Workspace?
Synapse Analytics Workspace Components and Features
Azure Data Lake Storage Gen2 Account and File System
Shared Metadata Management
Code Artifacts
What Is Synapse Studio?
Main Features of Synapse Studio
Home Hub
Data Hub
Develop Hub
Integrate Hub
Monitor Hub
Integration
Manage Hub
Analytics Pools
External Connections
Synapse Studio Capabilities
Data Preparation
Data Management
Data Warehousing
Machine Learning
Power BI in Synapse Studio
How to Create or Provision a New Azure Synapse Analytics Workspace Using Azure Portal
How to Launch Azure Synapse Studio
How to Link Power BI with Azure Synapse Studio
Chapter 9: Synapse Link
OLTP vs. OLAP
What Is HTAP?
Benefits of HTAP
No-ETL Analytics
Instant Insights
Reduced Data Duplication
Simplified Technical Architecture
What Is Azure Synapse Link?
Azure Cosmos DB
Azure Cosmos DB Analytical Store
Columnar Storage
Decoupling of Operational Store
Automatic Data Synchronization
SQL API and MongoDB API
Analytical TTL
Automatic Schema Updates
Cost-Effective Archiving
Scalability.
When to Use Azure Synapse Link for Cosmos DB
Azure Synapse Link Limitations
Azure Synapse Link Use Cases
Industrial IOT
Predictive Maintenance Pipeline
Operational Reporting
Real-Time Applications
Real-Time Personalization for E-Commerce Users
How to Enable Azure Synapse Link for Azure Cosmos DB
How to Create an Azure Cosmos DB Container with Analytical Store Using Azure Portal
How to Connect to Azure Synapse Link for Azure Cosmos DB Using Azure Portal
Chapter 10: Azure Synapse Analytics Use Cases and Reference Architecture
Where Should You Use Azure Synapse Analytics?
Large Volume of Data
Disparate Sources of Data
Data Transformation
Batch or Streaming Data
Where Should You Not Use Azure Synapse Analytics?
Use Cases for Azure Synapse Analytics
Financial Services
Manufacturing
Retail
Healthcare
Reference Architectures for Azure Synapse Analytics
Modern Data Warehouse Architecture
Real-Time Analytics on Big Data Architecture
Index.
Notes:
Description based on print version record.
ISBN:
9781484270615
1484270614
OCLC:
1257400854

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account