3 options
Hadoop 2.x administration cookbook : administer and maintain large Apache Hadoop clusters / Gurmukh Singh.
- Format:
- Book
- Author/Creator:
- Singh, Gurmukh, author.
- Language:
- English
- Subjects (All):
- Apache Hadoop.
- Electronic data processing--Distributed processing.
- Electronic data processing.
- Big data.
- Physical Description:
- 1 online resource (329 pages)
- Edition:
- 1st edition
- Place of Publication:
- Birmingham, England ; Mumbai, [India] : Packt Publishing, 2017.
- System Details:
- text file
- Biography/History:
- Singh Aman: Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing
- Summary:
- Over 100 practical recipes to help you become an expert Hadoop administrator About This Book Become an expert Hadoop administrator and perform tasks to optimize your Hadoop Cluster Import and export data into Hive and use Oozie to manage workflow. Practical recipes will help you plan and secure your Hadoop cluster, and make it highly available Who This Book Is For If you are a system administrator with a basic understanding of Hadoop and you want to get into Hadoop administration, this book is for you. It's also ideal if you are a Hadoop administrator who wants a quick reference guide to all the Hadoop administration-related tasks and solutions to commonly occurring problems What You Will Learn Set up the Hadoop architecture to run a Hadoop cluster smoothly Maintain a Hadoop cluster on HDFS, YARN, and MapReduce Understand high availability with Zookeeper and Journal Node Configure Flume for data ingestion and Oozie to run various workflows Tune the Hadoop cluster for optimal performance Schedule jobs on a Hadoop cluster using the Fair and Capacity scheduler Secure your cluster and troubleshoot it for various common pain points In Detail Hadoop enables the distributed storage and processing of large datasets across clusters of computers. Learning how to administer Hadoop is crucial to exploit its unique features. With this book, you will be able to overcome common problems encountered in Hadoop administration. The book begins with laying the foundation by showing you the steps needed to set up a Hadoop cluster and its various nodes. You will get a better understanding of how to maintain Hadoop cluster, especially on the HDFS layer and using YARN and MapReduce. Further on, you will explore durability and high availability of a Hadoop cluster. You'll get a better understanding of the schedulers in Hadoop and how to configure and use them for your tasks. You will also get hands-on experience with the backup and recovery options and the performance tuning aspects of Hadoop. Finally, you will get a better understanding of troubleshooting, diagnostics, and best practices in Hadoop administration. By the end of this book, you will have a proper understanding of working with Hadoop clusters and will also be able to secure, encrypt it, and configure auditing for your Hadoop clusters. Style and approach This book contains short recipes that will help you run a Hadoop cluster efficiently. The recipes are solutions to real-life problems that administrators...
- Contents:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Hadoop Architecture and Deployment
- Introduction
- Building and compiling Hadoop
- Installation methods
- Setting up host resolution
- Installing a single-node cluster - HDFS components
- Installing a single-node cluster - YARN components
- Installing a multi-node cluster
- Configuring the Hadoop Gateway node
- Decommissioning nodes
- Adding nodes to the cluster
- Chapter 2: Maintaining Hadoop Cluster HDFS
- Configuring HDFS block size
- Setting up Namenode metadata location
- Loading data in HDFS
- Configuring HDFS replication
- HDFS balancer
- Quota configuration
- HDFS health and FSCK
- Configuring rack awareness
- Recycle or trash bin configuration
- Distcp usage
- Control block report storm
- Configuring Datanode heartbeat
- Chapter 3: Maintaining Hadoop Cluster - YARN and MapReduce
- Running a simple MapReduce program
- Hadoop streaming
- Configuring YARN history server
- Job history web interface and metrics
- Configuring ResourceManager components
- YARN containers and resource allocations
- ResourceManager Web UI and JMX metrics
- Preserving ResourceManager states
- Chapter 4: High Availability
- Namenode HA using shared storage
- ZooKeeper configuration
- Namenode HA using Journal node
- Resourcemanager HA using ZooKeeper
- Rolling upgrade with HA
- Configure shared cache manager
- Configure HDFS cache
- HDFS snapshots
- Configuring storage based policies
- Configuring HA for Edge nodes
- Chapter 5: Schedulers
- Configuring users and groups
- Fair Scheduler configuration
- Fair Scheduler pools
- Configuring job queues
- Job queue ACLs
- Configuring Capacity Scheduler.
- Queuing mappings in Capacity Scheduler
- YARN and Mapred commands
- YARN label-based scheduling
- YARN SLS
- Chapter 6: Backup and Recovery
- Initiating Namenode saveNamespace
- Using HDFS Image Viewer
- Fetching parameters which are in-effect
- Configuring HDFS and YARN logs
- Backing up and recovering Namenode
- Configuring Secondary Namenode
- Promoting Secondary Namenode to Primary
- Namenode recovery
- Namenode roll edits - online mode
- Namenode roll edits - offline mode
- Datanode recovery - disk full
- Configuring NFS gateway to serve HDFS
- Recovering deleted files
- Chapter 7: Data Ingestion and Workflow
- Hive server modes and setup
- Using MySQL for Hive metastore
- Operating Hive with ZooKeeper
- Loading data into Hive
- Partitioning and Bucketing in Hive
- Hive metastore database
- Designing Hive with credential store
- Configuring Flume
- Configure Oozie and workflows
- Chapter 8: Performance Tuning
- Tuning the operating system
- Tuning the disk
- Tuning the network
- Tuning HDFS
- Tuning Namenode
- Tuning Datanode
- Configuring YARN for performance
- Configuring MapReduce for performance
- Hive performance tuning
- Benchmarking Hadoop cluster
- Chapter 9: HBase Administration
- Setting up single node HBase cluster
- Setting up multi-node HBase cluster
- Inserting data into HBase
- Integration with Hive
- HBase administration commands
- HBase backup and restore
- Tuning HBase
- HBase upgrade
- Migrating data from MySQL to HBase using Sqoop
- Chapter 10: Cluster Planning
- Disk space calculations
- Nodes needed in the cluster
- Memory requirements
- Sizing the cluster as per SLA
- Network design
- Estimating the cost of the Hadoop cluster
- Hardware and software options.
- Chapter 11: Troubleshooting, Diagnostics, and Best Practices
- Namenode troubleshooting
- Datanode troubleshooting
- Resourcemanager troubleshooting
- Diagnose communication issues
- Parse logs for errors
- Hive troubleshooting
- HBase troubleshooting
- Hadoop best practices
- Chapter 12: Security
- Encrypting disk using LUKS
- Configuring Hadoop users
- HDFS encryption at Rest
- Configuring SSL in Hadoop
- In-transit encryption
- Enabling service level authorization
- Securing ZooKeeper
- Configuring auditing
- Configuring Kerberos server
- Configuring and enabling Kerberos for Hadoop
- Index.
- Notes:
- Includes index.
- Description based on online resource; title from PDF title page (ebrary, viewed June 23, 2017).
- ISBN:
- 9781787126879
- 1787126870
- OCLC:
- 990194771
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.