2 options
Learning Hunk : visualize and analyze your Hadoop data using Hunk / Dmitry Anoshin, Sergey Sheypak.
- Format:
- Book
- Author/Creator:
- Anoshin, Dmitry, author.
- Sheypak, Sergey, author.
- Series:
- Community experience distilled.
- Community experience distilled
- Language:
- English
- Subjects (All):
- Apache Hadoop.
- Big data.
- Non-relational databases.
- Physical Description:
- 1 online resource (156 p.)
- Place of Publication:
- Birmingham : Packt Publishing, 2015.
- Biography/History:
- Anoshin Dmitry: Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics. Sheypak Sergey: Sergey Sheypak started his so-called big data practice in 2010 as a Teradata PS consultant. His was leading the Teradata Master Data Management deployment in Sberbank, Russia (which has 110 billion customers). Later Sergey switched to AsterData and Hadoop practices. Sergey joined the Research and Development team at MegaFon (one of the top three telecom companies in Russia with 70 billion customers) in 2012. While leading the Hadoop team at MegaFon, Sergey built ETL processes from existing Oracle DWH to HDFS. Automated end-to-end tests and acceptance tests were introduced as a mandatory part of the Hadoop development process. Scoring geospatial analysis systems based on specific telecom data were developed and launched. Now, Sergey works as independent consultant in Sweden.
- Summary:
- Visualize and analyze your Hadoop data using HunkKey Features[*] Explore your data in Hadoop and NoSQL data stores[*] Create and optimize your reporting experience with advanced data visualizations and data analytics[*] A comprehensive developer's guide that helps you create outstanding analytical solutions efficientlyBook DescriptionHunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data. This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform. You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.What you will learn[*] Deploy and configure Hunk on top of Cloudera Hadoop[*] Create and configure Virtual Indexes for datasets[*] Make your data presentable using the wide variety of data visualization components and knowledge objects[*] Design a data model using Hunk best practices[*] Add more flexibility to your analytics solution via extended SDK and custom visualizations[*] Discover data using MongoDB as a data source[*] Integrate Hunk with AWS Elastic MapReduce to improve scalabilityWho this book is forIf you are Hadoop developers who want to build efficient real-time Operation Intelligence Solutions based on Hadoop deployments or various NoSQL data stores using Hunk, this book is for you. Some familiarity with Splunk is assumed.
- Contents:
- Cover; Copyright; Credits; About the Authors; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Meet Hunk; Big data analytics; The big problem; The elegant solution; Supporting SPL; Intermediate results; Getting to know Hunk; Splunk versus Hunk; Hunk architecture; Connecting to Hadoop; Advance Hunk deployment; Native versus virtual indexes; Native indexes; Virtual index; External result provider; Computation models; Data streaming; Data reporting; Mixed mode; Hunk security; One Hunk user to one Hadoop user; Many Hunk users to one Hadoop user
- Hunk user(s) to the same Hadoop user with different queuesSetting up Hadoop; Starting and using a virtual machine with CDH5; SSH user; MySQL; Starting the VM and cluster in VirtualBox; Big data use case; Importing data from RDBMS to Hadoop using Sqoop; Telecommunications - SMS, Call, and Internet dataset from dandelion.eu; Milano grid map; CDR aggregated data import process; Periodical data import from MySQL using Sqoop and Oozie; Problems to solve; Summary; Chapter 2: Explore Hadoop Data with Hunk; Setting up Hunk; Extracting Hunk to a VM; Setting up Hunk variables and configuration files
- Running Hunk for the first timeSetting up a data provider and virtual index for CDR data; Setting up a connection to Hadoop; Setting up a virtual index for data stored in Hadoop; Accessing data through a virtual index; Exploring data; Creating reports; The top five browsers report; Top referrers; Site errors report; Creating alerts; Creating a dashboard; Controlling security with Hunk; The default Hadoop security; One Hunk user to one Hadoop user; Summary; Chapter 3: Meeting Hunk Features; Knowledge objects; Field aliases; Calculated fields; Field extractions; Tags; Event type
- Workflow actionsMacros; Data model; Add auto-extracting fields; Adding GeoIP attributes; Other ways to add attributes; Introducing Pivot; Summary; Chapter 4: Adding Speed to Reports; Big data performance issues; Hunk report acceleration; Creating a virtual index; Streaming mode; Creating an acceleration search; What's going on in Hadoop?; Report acceleration summaries; Reviewing summary details; Managing report accelerations; Hunk accelerations limits; Summary; Chapter 5: Customizing Hunk; What we are going to do with the Splunk SDK; Supported languages; Solving problems; REST API
- The implementation planThe conclusion; Dashboard customization using Splunk Web Framework; Functionality; A description of time-series aggregated CDR data; Source data; Creating a virtual index for Milano CDR; Creating a virtual index for the Milano grid; Creating a virtual index using sample data; Implementation; Querying the visualization; Downloading the application; Custom Google Maps; Page layout; Linear gradients and bins for the activity value; Custom map components; Other components; The final result; Summary; Chapter 6: Discovering Hunk Integration Apps; What is Mongo?; Installation
- Installing the Mongo app
- Notes:
- Includes index.
- Description based on online resource; title from PDF title page (ebrary, viewed July 6, 2016).
- ISBN:
- 1-78528-302-2
- OCLC:
- 1477033364
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.