My Account Log in

1 option

Hadoop : the definitive guide / by Tom White.

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
White, Tom (Tom E.)
Language:
English
Subjects (All):
Apache Hadoop.
Computer software.
Physical Description:
1 online resource (526 p.)
Edition:
First edition.
Place of Publication:
Sebastopol, California : O'Reilly Media, Inc., 2009.
Language Note:
English
System Details:
text file
Summary:
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: <p
Contents:
Table of Contents; Foreword; Preface; Administrative Notes; What's in This Book?; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgments; Chapter 1. Meet Hadoop; Data!; Data Storage and Analysis; Comparison with Other Systems; RDBMS; Grid Computing; Volunteer Computing; A Brief History of Hadoop; The Apache Hadoop Project; Chapter 2. MapReduce; A Weather Dataset; Data Format; Analyzing the Data with Unix Tools; Analyzing the Data with Hadoop; Map and Reduce; Java MapReduce; A test run; The new Java MapReduce API; Scaling Out; Data Flow
Combiner FunctionsSpecifying a combiner function; Running a Distributed MapReduce Job; Hadoop Streaming; Ruby; Python; Hadoop Pipes; Compiling and Running; Chapter 3. The Hadoop Distributed Filesystem; The Design of HDFS; HDFS Concepts; Blocks; Namenodes and Datanodes; The Command-Line Interface; Basic Filesystem Operations; Hadoop Filesystems; Interfaces; Thrift; C; FUSE; WebDAV; Other HDFS Interfaces; The Java Interface; Reading Data from a Hadoop URL; Reading Data Using the FileSystem API; FSDataInputStream; Writing Data; FSDataOutputStream; Directories; Querying the Filesystem
File metadata: FileStatusListing files; File patterns; PathFilter; Deleting Data; Data Flow; Anatomy of a File Read; Anatomy of a File Write; Coherency Model; Consequences for application design; Parallel Copying with distcp; Keeping an HDFS Cluster Balanced; Hadoop Archives; Using Hadoop Archives; Limitations; Chapter 4. Hadoop I/O; Data Integrity; Data Integrity in HDFS; LocalFileSystem; ChecksumFileSystem; Compression; Codecs; Compressing and decompressing streams with CompressionCodec; Inferring CompressionCodecs using CompressionCodecFactory; Native libraries
Compression and Input SplitsUsing Compression in MapReduce; Compressing map output; Serialization; The Writable Interface; WritableComparable and comparators; Writable Classes; Writable wrappers for Java primitives; Text; BytesWritable; NullWritable; ObjectWritable and GenericWritable; Writable collections; Implementing a Custom Writable; Implementing a RawComparator for speed; Custom comparators; Serialization Frameworks; Serialization IDL; File-Based Data Structures; SequenceFile; Writing a SequenceFile; Reading a SequenceFile; Displaying a SequenceFile with the command-line interface
Sorting and merging SequenceFilesThe SequenceFile Format; MapFile; Writing a MapFile; Reading a MapFile; Converting a SequenceFile to a MapFile; Chapter 5. Developing a MapReduce Application; The Configuration API; Combining Resources; Variable Expansion; Configuring the Development Environment; Managing Configuration; GenericOptionsParser, Tool, and ToolRunner; Writing a Unit Test; Mapper; Reducer; Running Locally on Test Data; Running a Job in a Local Job Runner; Fixing the mapper; Testing the Driver; Running on a Cluster; Packaging; Launching a Job; The MapReduce Web UI
The jobtracker page
Notes:
Description based upon print version of record.
Description based on online resource; title from PDF title page (ebrary, viewed October 1, 2013).
ISBN:
9781306817462
1306817463
9780596551360
0596551363
9780596551179
0596551177
OCLC:
317877866

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account