My Account Log in

3 options

Apache Hive essentials : essential techniques to help you process, and get unique insights from, big data / Dayong Du.

EBSCOhost Academic eBook Collection (North America) Available online

View online

Ebook Central Academic Complete Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Du, Dayong, author.
Language:
English
Subjects (All):
Apache Hadoop.
Databases--Design--Data processing.
Databases.
Physical Description:
1 online resource (203 pages)
Edition:
Second edition.
Place of Publication:
Birmingham, UK : Packt Publishing Ltd, [2018]
System Details:
text file
Biography/History:
Du Dayong: Dayong Du has all his career dedicated to enterprise data and analytics for more than 10 years, especially on enterprise use case with open source big data technology, such as Hadoop, Hive, HBase, Spark, etc. Dayong is a big data practitioner as well as author and coach. He has published the 1st and 2nd edition of Apache Hive Essential and coached lots of people who are interested to learn and use big data technology. In addition, he is a seasonal blogger, contributor, and advisor for big data start-ups, co-founder of Toronto big data professional association.
Summary:
This book takes you on a fantastic journey to discover the attributes of big data using Apache Hive. About This Book Grasp the skills needed to write efficient Hive queries to analyze the Big Data Discover how Hive can coexist and work with other tools within the Hadoop ecosystem Uses practical, example-oriented scenarios to cover all the newly released features of Apache Hive 2.3.3 Who This Book Is For If you are a data analyst, developer, or simply someone who wants to quickly get started with Hive to explore and analyze Big Data in Hadoop, this is the book for you. Since Hive is an SQL-like language, some previous experience with SQL will be useful to get the most out of this book. What You Will Learn Create and set up the Hive environment Discover how to use Hive's definition language to describe data Discover interesting data by joining and filtering datasets in Hive Transform data by using Hive sorting, ordering, and functions Aggregate and sample data in different ways Boost Hive query performance and enhance data security in Hive Customize Hive to your needs by using user-defined functions and integrate it with other tools In Detail In this book, we prepare you for your journey into big data by frstly introducing you to backgrounds in the big data domain, alongwith the process of setting up and getting familiar with your Hive working environment. Next, the book guides you through discovering and transforming the values of big data with the help of examples. It also hones your skills in using the Hive language in an effcient manner. Toward the end, the book focuses on advanced topics, such as performance, security, and extensions in Hive, which will guide you on exciting adventures on this worthwhile big data journey. By the end of the book, you will be familiar with Hive and able to work effeciently to find solutions to big data problems Style and approach This book takes on a practical approach which will get you familiarized with Apache Hive and how to use it to efficiently to find solutions to your big data problems. This book covers crucial topics like performance, and data security in order to help you make the most of the Hive working environment. Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-ma...
Contents:
Cover
Title Page
Copyright and Credits
Dedication
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: Overview of Big Data and Hive
A short history
Introducing big data
The relational and NoSQL databases versus Hadoop
Batch, real-time, and stream processing
Overview of the Hadoop ecosystem
Hive overview
Summary
Chapter 2: Setting Up the Hive Environment
Installing Hive from Apache
Installing Hive from vendors
Using Hive in the cloud
Using the Hive command
Using the Hive IDE
Chapter 3: Data Definition and Description
Understanding data types
Data type conversions
Data Definition Language
Database
Tables
Table creation
Table description
Table cleaning
Table alteration
Partitions
Buckets
Views
Chapter 4: Data Correlation and Scope
Project data with SELECT
Filtering data with conditions
Linking data with JOIN
INNER JOIN
OUTER JOIN
Special joins
Combining data with UNION
Chapter 5: Data Manipulation
Data exchanging with LOAD
Data exchange with INSERT
Data exchange with [EX|IM]PORT
Data sorting
Functions
Function tips for collections
Function tips for date and string
Virtual column functions
Transactions and locks
Transactions
UPDATE statement
DELETE statement
MERGE statement
Locks
Chapter 6: Data Aggregation and Sampling
Basic aggregation
Enhanced aggregation
Grouping sets
Rollup and Cube
Aggregation condition
Window functions
Window aggregate functions
Window sort functions
Window analytics functions
Window expression
Sampling
Random sampling
Bucket table sampling
Block sampling
Chapter 7: Performance Considerations
Performance utilities
EXPLAIN statement.
ANALYZE statement
Logs
Design optimization
Partition table design
Bucket table design
Index design
Use skewed/temporary tables
Data optimization
File format
Compression
Storage optimization
Job optimization
Local mode
JVM reuse
Parallel execution
Join optimization
Common join
Map join
Bucket map join
Sort merge bucket (SMB) join
Sort merge bucket map (SMBM) join
Skew join
Job engine
Optimizer
Vectorization optimization
Cost-based optimization
Chapter 8: Extensibility Considerations
User-defined functions
UDF code template
UDAF code template
UDTF code template
Development and deployment
HPL/SQL
Streaming
SerDe
Chapter 9: Security Considerations
Authentication
Metastore authentication
Hiveserver2 authentication
Authorization
Legacy mode
Storage-based mode
SQL standard-based mode
Mask and encryption
The data-hashing function
The data-masking function
The data-encryption function
Other methods
Chapter 10: Working with Other Tools
The JDBC/ODBC connector
NoSQL
The Hue/Ambari Hive view
HCatalog
Oozie
Spark
Hivemall
Other Books You May Enjoy
Index.
Notes:
Previous edition published: 2015.
Description based on print version record.
ISBN:
9781789136517
1789136512
OCLC:
1044944891

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account