My Account Log in

1 option

Elasticsearch for Hadoop : integrate Elasticsearch into Hadoop to effectively visualize and analyze your data / Vishal Shukla.

EBSCOhost Academic eBook Collection (North America) Available online

View online
Format:
Book
Author/Creator:
Shukla, Vishal, author.
Series:
Community experience distilled.
Community experience distilled
Language:
English
Subjects (All):
Apache Hadoop.
Application software--Development.
Application software.
Open source software.
Client/server computing.
Physical Description:
1 online resource (222 p.)
Place of Publication:
Birmingham : Packt Publishing, 2015.
Language Note:
English
Summary:
Integrate Elasticsearch into Hadoop to effectively visualize and analyze your dataAbout This Book* Build production-ready analytics applications by integrating the Hadoop ecosystem with Elasticsearch* Learn complex Elasticsearch queries and develop real-time monitoring Kibana dashboards to visualize your data* Use Elasticsearch and Kibana to search data in Hadoop easily with this comprehensive, step-by-step guideWho This Book Is ForThis book is targeted at Java developers with basic knowledge on Hadoop. No prior Elasticsearch experience is expected.What You Will Learn* Set up the Elasticsearch-Hadoop environment* Import HDFS data into Elasticsearch with MapReduce jobs* Perform full-text search and aggregations efficiently using Elasticsearch* Visualize data and create interactive dashboards using Kibana* Check and detect anomalies in streaming data using Storm and Elasticsearch* Inject and classify real-time streaming data into Elasticsearch* Get production-ready for Elasticsearch-Hadoop based projects* Integrate with Hadoop eco-system such as Pig, Storm, Hive, and SparkIn DetailThe Hadoop ecosystem is a de-facto standard for processing terra-bytes and peta-bytes of data. Lucene-enabled Elasticsearch is becoming an industry standard for its full-text search and aggregation capabilities. Elasticsearch-Hadoop serves as a perfect tool to bridge the worlds of Elasticsearch and Hadoop ecosystem to get best out of both the worlds. Powered with Kibana, this stack makes it a cakewalk to get surprising insights out of your massive amount of Hadoop ecosystem in a flash.In this book, you'll learn to use Elasticsearch, Kibana and Elasticsearch-Hadoop effectively to analyze and understand your HDFS and streaming data.You begin with an in-depth understanding of the Hadoop, Elasticsearch, Marvel, and Kibana setup. Right after this, you will learn to successfully import Hadoop data into Elasticsearch by writing MapReduce job in a real-world example. This is then followed by a comprehensive look at Elasticsearch essentials, such as full-text search analysis, queries, filters and aggregations; after which you gain an understanding of creating various visualizations and interactive dashboard using Kibana. Classifying your real-world streaming data and identifying trends in it using Storm and Elasticsearch are some of the other topics that we'll cover. You will also gain an insight about key concepts of Elasticsearch and Elasticsearch-hadoop in distributed mode, advanced configurations along with some common configuration presets you may need for your production deployments. You will have "Go production checklist" and high-level view for cluster administration for post-production. Towards the end, you will learn to integrate Elasticsearch with other Hadoop eco-system tools, such as Pig, Hive and Spark.Style and approachA concise yet comprehensive approach has been adopted with real-time examples to help you grasp the concepts easily.
Contents:
Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Setting Up Environment; Setting up Hadoop for Elasticsearch; Setting up Java; Setting up a dedicated user; Installing SSH and setting up the certificate; Downloading Hadoop; Setting up environment variables; Configuring Hadoop; Configuring core-site.xml; Configuring hdfs-site.xml; Configuring yarn-site.xml; Configuring mapred-site.xml; The format distributed filesystem; Starting Hadoop daemons; Setting up Elasticsearch; Downloading Elasticsearch; Configuring Elasticsearch
Installing Elasticsearch's Head pluginInstalling the Marvel plugin; Running and testing; Running the WordCount example; Getting the examples and building the job JAR file; Importing the test file to HDFS; Running our first job; Exploring data in Head and Marvel; Viewing data in Head; Using the Marvel dashboard; Exploring the data in Sense; Summary; Chapter 2: Getting Started with ES-Hadoop; Understanding the WordCount program; Understanding Mapper; Understanding the reducer; Understanding the driver; Using the old API - org.apache.hadoop.mapred; Going real - network monitoring data
Getting and understanding the dataKnowing the problems; Solution approaches; Approach 1 - Preaggregate the results; Approach 2 - Aggregate the results at query-time; Writing the NetworkLogsMapper job; Writing the mapper class; Writing Driver; Building the job; Getting the data into HDFS; Running the job; Viewing the Top N results; Getting data from Elasticsearch to HDFS; Understanding the Twitter dataset; Trying it yourself; Creating the MapReduce job to import data from Elasticsearch to HDFS; Writing the Tweets2Hdfs mapper; Running the example; Testing the job execution output; Summary
Chapter 3: Understanding ElasticsearchKnowing Search and Elasticsearch; The paradigm mismatch; Index; Type; Document; Field; Talking to Elasticsearch; CRUD with Elasticsearch; Creating the document request; Mappings; Data types; Create mapping API; Index templates; Controlling the indexing process; What is an inverted index?; The input data analysis; Removing stop words; Case insensitive; Stemming; Synonyms; Analyzers; Elastic searching; Writing search queries; The URI search; Matching all queries; The term query; The boolean query; The match query; The range query; The wildcard query
FiltersAggregations; Executing the aggregation queries; The terms aggregation; Histograms; The range aggregation; The geo distance; Sub-aggregations; Try it yourself; Summary; Chapter 4: Visualizing Big Data Using Kibana; Setting up and getting started; Setting up Kibana; Setting up datasets; Try it out; Getting started with Kibana; Discovering data; Visualizing the data; The pie chart; The stacked bar chart; The date histogram with the stacked bar chart; The area chart; The split pie chart; The sun burst chart; The geographical chart; Trying it out; Creating dynamic dashboards; Summary
Chapter 5: Real-Time Analytics
Notes:
Includes index.
Description based on online resource; title from PDF title page (ebrary, viewed January 4, 2016).
ISBN:
9781785282249
1785282247

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account