2 options
Building Python real-time applications with Storm : learn to process massive real-time data streams using Storm and Python-- no Java required / Kartik Bhatnagar, Barry Hart.
- Format:
- Book
- Author/Creator:
- Bhatnagar, Kartik, author.
- Hart, Barry, author.
- Series:
- Community experience distilled.
- Community experience distilled
- Language:
- English
- Subjects (All):
- Computer security.
- Python (Computer program language).
- Physical Description:
- 1 online resource (122 p.)
- Edition:
- 1st edition
- Place of Publication:
- Birmingham : Packt Publishing, 2015.
- System Details:
- text file
- Summary:
- Learn to process massive real-time data streams using Storm and Python - no Java required! About This Book Learn to use Apache Storm and the Python Petrel library to build distributed applications that process large streams of data Explore sample applications in real-time and analyze them in the popular NoSQL databases MongoDB and Redis Discover how to apply software development best practices to improve performance, productivity, and quality in your Storm projects Who This Book Is For This book is intended for Python developers who want to benefit from Storm's real-time data processing capabilities. If you are new to Python, you'll benefit from the attention to key supporting tools and techniques such as automated testing, virtual environments, and logging. If you're an experienced Python developer, you'll appreciate the thorough and detailed examples What You Will Learn Install Storm and learn about the prerequisites Get to know the components of a Storm topology and how to control the flow of data between them Ingest Twitter data directly into Storm Use Storm with MongoDB and Redis Build topologies and run them in Storm Use an interactive graphical debugger to debug your topology as it's running in Storm Test your topology components outside of Storm Configure your topology using YAML In Detail Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data ?bag of tricks.? At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily. You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you'll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices. Style and approach This book takes an easy-to-follow and a practical approach to help you understand all the concepts related to Storm and Python.
- Contents:
- Cover
- Copyright
- Credits
- About the Authors
- About the Reviewers
- www.PacktPub.com
- Table of Contents
- Preface
- Chapter 1: Getting Acquainted with Storm
- Overview of Storm
- Before the Storm era
- Key features of Storm
- Storm cluster modes
- Developer mode
- Single-machine Storm cluster
- Multimachine Storm cluster
- The Storm client
- Prerequisites for a Storm installation
- Zookeeper installation
- Storm installation
- Enabling native (Netty only) dependency
- Netty configuration
- Starting daemons
- Playing with optional configurations
- Summary
- Chapter 2: The Storm Anatomy
- Storm processes
- Supervisor
- Zookeeper
- The Storm UI
- Storm-topology-specific terminologies
- The worker process, executor, and task
- Worker processes
- Executors
- Tasks
- Interprocess communication
- A physical view of a Storm cluster
- Stream grouping
- Fault tolerance in Storm
- Guaranteed tuple processing in Storm
- XOR magic in acking
- Tuning parallelism in Storm - scaling a distributed computation
- Chapter 3: Introducing Petrel
- What is Petrel?
- Building a topology
- Packaging a topology
- Logging events and errors
- Managing third-party dependencies
- Installing Petrel
- Creating your first topology
- Sentence spout
- Splitter bolt
- Word Counting Bolt
- Defining a topology
- Running the topology
- Troubleshooting
- Productivity tips with Petrel
- Improving startup performance
- Enabling and using logging
- Automatic logging of fatal errors
- Chapter 4: Example Topology - Twitter
- Twitter analysis
- Twitter's Streaming API
- Creating a Twitter app to use the Streaming API
- The topology configuration file
- The Twitter stream spout
- Rolling word count bolt
- The intermediate rankings bolt
- The total rankings bolt.
- Defining the topology
- Chapter 5: Persistence Using Redis and MongoDB
- Finding the top n ranked topics using Redis
- The topology configuration file - the Redis case
- Rolling word count bolt - the Redis case
- Total rankings bolt - the Redis case
- Defining the topology - the Redis case
- Running the topology - the Redis case
- Finding the hourly count of tweets by city name using MongoDB
- Defining the topology - the MongoDB case
- Running the topology - the MongoDB case
- Chapter 6: Petrel in Practice
- Testing a bolt
- Example - testing SplitSentenceBolt
- Example - testing SplitSentenceBolt with WordCountBolt
- Debugging
- Installing Winpdb
- Add Winpdb breakpoint
- Launching and attaching the debugger
- Profiling your topology's performance
- Split sentence bolt log
- Word count bolt log
- Appendix: Managing Storm Using Supervisord
- Storm administration over a cluster
- Introducing supervisord
- Supervisord components
- Supervisord installation
- Index.
- Notes:
- Includes index.
- Description based on online resource; title from PDF title page (ebrary, viewed January 12, 2016).
- ISBN:
- 9781784392871
- 1784392871
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.