My Account Log in

3 options

Mastering Hadoop : go beyond the basics and master the next generation of Hadoop data processing platforms / Sandeep Karanth.

EBSCOhost Academic eBook Collection (North America) Available online

View online

Ebook Central College Complete Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Karanth, Sandeep, author.
Series:
Community experience distilled.
Community Experience Distilled
Language:
English
Subjects (All):
Apache Hadoop.
Application software--Development.
Application software.
Physical Description:
1 online resource (374 p.)
Edition:
1st edition
Other Title:
Go beyond the basics and master the next generation of Hadoop data processing platforms
Place of Publication:
Birmingham, England : Packt Publishing, 2014.
Language Note:
English
System Details:
text file
Biography/History:
Karanth Sandeep: Sandeep Karanth is a technical architect who specializes in building and operationalizing software systems. He has more than 14 years of experience in the software industry, working on a gamut of products ranging from enterprise data applications to newer-generation mobile applications. He has primarily worked at Microsoft Corporation in Redmond, Microsoft Research in India, and is currently a cofounder at Scibler, architecting data intelligence products.
Summary:
Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Contents:
Cover ; Copyright; Credits; About the Author; Acknowledgments; About the Reviewers; www.PacktPub.com; Untitled; Untitled; Table of Contents; Preface; Chapter 1: Hadoop 2.X; The inception of Hadoop; The evolution of Hadoop; Hadoop's genealogy; Hadoop-0.20-append; Hadoop-0.20-security; Hadoop's timeline; Hadoop 2.X; Yet Another Resource Negotiator (YARN); Architecture overview; Storage layer enhancements; High availability; HDFS Federation; HDFS snapshots; Other enhancements; Support enhancements; Hadoop distributions; Which Hadoop distribution?; Performance; Scalability; Reliability
ManageabilityAvailable distributions; Cloudera Distribution of Hadoop (CDH); Hortonworks Data Platform (HDP); MapR; Pivotal HD; Summary; Chapter 2: Advanced MapReduce; MapReduce input; The InputFormat class; The InputSplit class; The RecordReader class; Hadoop's ""small files"" problem; Filtering inputs; The Map task; The dfs.blocksize attribute; Sort and spill of intermediate outputs; Node-local Reducers or Combiners; Fetching intermediate outputs - Map-side; The Reduce task; Fetching intermediate outputs - Reduce-side; Merge and spill of intermediate outputs; MapReduce output
Speculative execution of tasksMapReduce job counters; Handling data joins; Reduce-side joins; Map-side joins; Summary; Chapter 3: Advanced Pig; Pig versus SQL; Different modes of execution; Complex data types in Pig; Compiling Pig scripts; The logical plan; The physical plan; The MapReduce plan; Development and debugging aids; The DESCRIBE command; The EXPLAIN command; The ILLUSTRATE command; The advanced Pig operators; The advanced FOREACH operator; The FLATTEN operator; The nested FOREACH operator; The COGROUP operator; The UNION operator; The CROSS operator; Specialized joins in Pig
The Replicated joinSkewed joins; The Merge join; User-defined functions; The evaluation functions; The aggregate functions; The filter functions; The load functions; The store functions; Pig performance optimizations; The optimization rules; Measurement of Pig script performance; Combiners in Pig; Memory for the Bag data type; Number of reducers in Pig; The multiquery mode in Pig; Best practices; The explicit usage of types; Early and frequent projection; Early and frequent filtering; The usage of the LIMIT operator; The usage of the DISTINCT operator; The reduction of operations
The usage of Algebraic UDFsThe usage of Accumulator UDFs; Eliminating nulls in the data; The usage of specialized joins; Compressing intermediate results; Combining smaller files; Summary; Chapter 4: Advanced Hive; The Hive architecture; The Hive metastore; The Hive compiler; The Hive execution engine; The supporting components of Hive; Data types; File formats; Compressed files; ORC files; The Parquet files; The data model; Dynamic partitions; Semantics for dynamic partitioning; Indexes on Hive tables; Hive query optimizers; Advanced DML; The GROUP BY operation
ORDER BY versus SORT BY clauses
Notes:
Includes index.
Description based on online resource; title from PDF title page (ebrary, viewed January 14, 2015).
ISBN:
9781783983650
1783983655
OCLC:
900898176

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account