My Account Log in

4 options

Learning apache apex : real-time streaming applications with apex / Thomas Weise [and three others].

EBSCOhost Academic eBook Collection (North America) Available online

View online

EBSCOhost Ebook Business Collection Available online

View online

Ebook Central College Complete Available online

View online

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Weise, Thomas, author.
Language:
English
Subjects (All):
Apache Apex.
Application software--Development.
Application software.
Physical Description:
1 online resource (1 volume) : illustrations
Edition:
1st edition
Place of Publication:
Birmingham, [England] ; Mumbai, [India] : Packt Publishing, 2017.
System Details:
Mode of access: World Wide Web.
text file
Biography/History:
Gundabattula Ananth: Ananth is a senior application architect in the Decisioning and Advanced Analytics architecture team for Commonwealth Bank of Australia. Ananth holds a Ph. D degree in the domain of computer science security and is interested in all things data including low latency distributed processing systems, machine learning and data engineering domains. He holds 3 patents granted by USPTO and has one application pending. Prior to joining to CBA, he was an architect at Threatmetrix and the member of the core team that scaled Threatmetrix architecture to 100 million transactions per day that runs at very low latencies using Cassandra, Zookeeper and Kafka. He also migrated Threatmetrix data warehouse into the next generation architecture based on Hadoop and Impala. Prior to Threatmetrix, he worked for the IBM software labs and IBM CIO labs enabling some of the first IBM CIO projects onboarding HBase, Hadoop and Mahout stack. Ananth is a committer for Apache Apex and is currently working for the next generation architectures for CBA fraud platform and Advanced Analytics Omnia platform at CBA. Weise Thomas: Thomas Weise is the Apache Apex PMC Chair and cofounder at Atrato. Earlier, he worked at a number of other technology companies in the San Francisco Bay Area, including DataTorrent, where he was a cofounder of the Apex project. Thomas is also a committer to Apache Beam and has contributed to several more of the ecosystem projects. He has been working on distributed systems for 20 years and has been a speaker at international big data conferences. Thomas received the degree of Diplom-Informatiker (MSc in computer science) from TU Dresden, Germany. He can be reached on Twitter at: @thweise. V. Ramanath Munagala: Dr. Munagala V. Ramanath got his PhD in Computer Science from the University of Wisconsin, USA and an MSc in Mathematics from Carleton University, Ottawa, Canada. After that, he taught Computer Science courses as Assistant/Associate Professor at the University of Western Ontario in Canada for a few years, before transitioning to the corporate sphere. Since then, he has worked as a senior software engineer at a number of technology companies in California including SeeBeyond, EMC, Sun Microsystems, DataTorrent, and Cloudera. He has published papers in peer reviewed journals in several areas including code optimization, graph theory, and image processing. Yan David: David Yan is based in the Silicon Valley, California. He is a senior software engineer at Google. Prior to Google, he worked at DataTorrent, Yahoo! , and the Jet Propulsion Laboratory. David holds a master of science in Computer Science from Stanford University and a bachelor of science in Electrical Engineering and Computer Science from the University of California at BerkeleyKnowles Kenneth: Kenneth Knowles is a founding PMC member of Apache Beam. Kenn has been working on Google Cloud DataflowGoogle's Beam backendsince 2014. Prior to that, he built backends for startups such as Cityspan, Inkling, and Dimagi. Kenn holds a PhD in Programming Language Theory from the University of California, Santa Cruz.
Summary:
Designing and writing a real-time streaming publication with Apache Apex About This Book Get a clear, practical approach to real-time data processing Program Apache Apex streaming applications This book shows you Apex integration with the open source Big Data ecosystem Who This Book Is For This book assumes knowledge of application development with Java and familiarity with distributed systems. Familiarity with other real-time streaming frameworks is not required, but some practical experience with other big data processing utilities might be helpful. What You Will Learn Put together a functioning Apex application from scratch Scale an Apex application and configure it for optimal performance Understand how to deal with failures via the fault tolerance features of the platform Use Apex via other frameworks such as Beam Understand the DevOps implications of deploying Apex In Detail Apache Apex is a next-generation stream processing framework designed to operate on data at large scale, with minimum latency, maximum reliability, and strict correctness guarantees. Half of the book consists of Apex applications, showing you key aspects of data processing pipelines such as connectors for sources and sinks, and common data transformations. The other half of the book is evenly split into explaining the Apex framework, and tuning, testing, and scaling Apex applications. Much of our economic world depends on growing streams of data, such as social media feeds, financial records, data from mobile devices, sensors and machines (the Internet of Things - IoT). The projects in the book show how to process such streams to gain valuable, timely, and actionable insights. Traditional use cases, such as ETL, that currently consume a significant chunk of data engineering resources are also covered. The final chapter shows you future possibilities emerging in the streaming space, and how Apache Apex can contribute to it. Style and approach This book is divided into two major parts: first it explains what Apex is, what its relevant parts are, and how to write well-built Apex applications. The second part is entirely application-driven, walking you through Apex applications of increasing complexity.
Contents:
Cover
Title Page
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Introduction to Apex
Unbounded data and continuous processing
Stream processing
Stream processing systems
What is Apex and why is it important?
Use cases and case studies
Real-time insights for Advertising Tech (PubMatic)
Industrial IoT applications (GE)
Real-time threat detection (Capital One)
Silver Spring Networks (SSN)
Application Model and API
Directed Acyclic Graph (DAG)
Apex DAG Java API
High-level Stream Java API
SQL
JSON
Windowing and time
Value proposition of Apex
Low latency and stateful processing
Native streaming versus micro-batch
Performance
Where Apex excels
Where Apex is not suitable
Summary
Chapter 2: Getting Started with Application Development
Development process and methodology
Setting up the development environment
Creating a new Maven project
Application specifications
Custom operator development
The Apex operator model
CheckpointListener/CheckpointNotificationListener
ActivationListener
IdleTimeHandler
Application configuration
Testing in the IDE
Writing the integration test
Running the application on YARN
Execution layer components
Installing Apex Docker sandbox
Running the application
Working on the cluster
YARN web UI
Apex CLI
Logging
Dynamically adjusting logging levels
Chapter 3: The Apex Library
An overview of the library
Integrations
Apache Kafka
Kafka input
Kafka output
Other streaming integrations
JMS (ActiveMQ, SQS, and so on)
Kinesis streams
Files
File input
File splitter and block reader
File writer
Databases
JDBC input
JDBC output
Other databases.
Transformations
Parser
Filter
Enrichment
Map transform
Custom functions
Windowed transformations
Windowing
Global Window
Time Windows
Sliding Time Windows
Session Windows
Window propagation
State
Accumulation
Accumulation Mode
State storage
Watermarks
Allowed lateness
Triggering
Merging of streams
The windowing example
Dedup
Join
State Management
Chapter 4: Scalability, Low Latency, and Performance
Partitioning and how it works
Elasticity
Partitioning toolkit
Configuring and triggering partitioning
StreamCodec
Unifier
Custom dynamic partitioning
Performance optimizations
Affinity and anti-affinity
Low-latency versus throughput
Sample application for dynamic partitioning
Performance - other aspects for custom operators
Chapter 5: Fault Tolerance and Reliability
Distributed systems need to be resilient
Fault-tolerance components and mechanism in Apex
Checkpointing
When to checkpoint
How to checkpoint
What to checkpoint
Incremental state saving
Incremental recovery
Processing guarantees
Example - exactly-once counting
The exactly-once output to JDBC
Chapter 6: Example Project - Real-Time Aggregation and Visualization
Streaming ETL and beyond
The application pattern in a real-world use case
Analyzing Twitter feed
Top Hashtags
TweetStats
Configuring Twitter API access
Enabling WebSocket output
The Pub/Sub server
Grafana visualization
Installing Grafana
Installing Grafana Simple JSON Datasource
The Grafana Pub/Sub adapter server
Setting up the dashboard
Chapter 7: Example Project - Real-Time Ride Service Data Processing
The goal
Datasource
The pipeline.
Simulation of a real-time feed using historical data
Parsing the data
Looking up of the zip code and preparing for the windowing operation
Windowed operator configuration
Serving the data with WebSocket
Running the application on GCP Dataproc
Chapter 8: Example Project - ETL Using SQL
The application pipeline
Building and running the application
The application code
Partitioning
Application testing
Understanding application logs
Calcite integration
Chapter 9: Introduction to Apache Beam
Introduction to Apache Beam
Beam concepts
Pipelines, PTransforms, and PCollections
ParDo - elementwise computation
GroupByKey/CombinePerKey - aggregation across elements
Windowing, watermarks, and triggering in Beam
Windowing in Beam
Watermarks in Beam
Triggering in Beam
Advanced topic - stateful ParDo
WordCount in Apache Beam
Setting up your pipeline
Reading the works of Shakespeare in parallel
Splitting each line on spaces
Eliminating empty strings
Counting the occurrences of each word
Format your results
Writing to a sharded text file in parallel
Testing the pipeline at small scale with DirectRunner
Running Apache Beam WordCount on Apache Apex
Chapter 10: The Future of Stream Processing
Lower barrier for building streaming pipelines
Visual development tools
Streaming SQL
Better programming API
Bridging the gap between data science and engineering
Machine learning integration
State management
State query and data consistency
Containerized infrastructure
Management tools
Index.
Notes:
Includes bibliographical references and index.
Description based on online resource; title from PDF title page (EBC, viewed January 3, 2018).
ISBN:
9781788294119
1788294114
OCLC:
1019128795

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account