1 option
Grokking streaming systems : real-time event processing / Josh Fischer, Ning Wang.
- Format:
- Book
- Author/Creator:
- Fischer, Joshua, author.
- Wang, Ning, author.
- Language:
- English
- Subjects (All):
- Streaming technology (Telecommunications).
- Physical Description:
- 1 online resource (129 pages)
- Place of Publication:
- Shelter Island, New York : Manning Publications, [2021]
- Summary:
- Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you'll build your own simple streaming tool from the ground up to make sure all the ideas and techniques stick. The helpful and entertaining illustrations make streaming systems come alive as you tackle relevant examples like real-time credit card fraud detection and monitoring IoT services.
- Contents:
- Intro
- inside front cover
- Grokking Streaming Systems
- Copyright
- brief contents
- contents
- front matter
- preface
- acknowledgments
- about this book
- about the authors
- Part 1. Getting started with streaming
- 1 Welcome to Grokking Streaming Systems
- What is stream processing?
- Streaming system examples
- Streaming systems and real time
- How a streaming system works
- Applications
- Backend services
- Inside a backend service
- Batch processing systems
- Inside a batch processing system
- Stream processing systems
- Inside a stream processing system
- The advantages of multi-stage architecture
- The multi-stage architecture in batch and stream processing systems
- Compare the systems
- A model stream processing system
- Summary
- Exercise
- 2 Hello, streaming systems!
- The chief needs a fancy tollbooth
- It started as HTTP requests, and it failed
- AJ and Miranda take time to reflect
- AJ ponders about streaming systems
- Comparing backend service and streaming
- How a streaming system could fit
- Queues: A foundational concept
- Data transfer via queues
- Our streaming framework (the start of it)
- The Streamwork framework overview
- Zooming in on the Streamwork engine
- Core streaming concepts
- More details of the concepts
- The streaming job execution flow
- Your first streaming job
- Executing the job
- Inspecting the job execution
- Look inside the engine
- Keep events moving
- The life of a data element
- Reviewing streaming concepts
- Exercises
- 3 Parallelization and data grouping
- The sensor is emitting more events
- Even in streaming, real time is hard
- New concepts: Parallelism is important
- New concepts: Data parallelism
- New concepts: Data execution independence
- New concepts: Task parallelism
- Data parallelism vs. task parallelism.
- Parallelism and concurrency
- Parallelizing the job
- Parallelizing components
- Parallelizing sources
- Viewing job output
- Parallelizing operators
- Events and instances
- Event ordering
- Event grouping
- Shuffle grouping
- Shuffle grouping: Under the hood
- Fields grouping
- Fields grouping: Under the hood
- Event grouping execution
- Look inside the engine: Event dispatcher
- Applying fields grouping in your job
- Comparing grouping behaviors
- 4 Stream graph
- A credit card fraud detection system
- More about the credit card fraud detection system
- The fraud detection business
- Streaming isn't always a straight line
- Zoom into the system
- The fraud detection job in detail
- New concepts
- Upstream and downstream components
- Stream fan-out and fan-in
- Graph, directed graph, and DAG
- DAG in stream processing systems
- All new concepts in one page
- Stream fan-out to the analyzers
- There is a problem: Efficiency
- Stream fan-out with different streams
- Look inside the engine again
- Communication between the components via channels
- Multiple channels
- Stream fan-in to the score aggregator
- Stream fan-in in the engine
- A brief introduction to anotherstream fan-in: Join
- Look at the whole system
- Graph and streaming jobs
- The example systems
- 5 Delivery semantics
- The latency requirement of the fraud detection system
- Revisit the fraud detection job
- About accuracy
- Partial result
- A new streaming job to monitor system usage
- The new system usage job
- The requirements of the new system usage job
- New concepts: (The number of) times delivered and times processed
- New concept: Delivery semantics
- Choosing the right semantics
- At-most-once.
- The fraud detection job
- At-least-once
- At-least-once with acknowledging
- Track events
- Handle event processing failures
- Track early out events
- Acknowledging code in components
- New concept: Checkpointing
- New concept: State
- Checkpointing in the system usage job for the at-least-once semantic
- Checkpointing and state manipulation functions
- State handling code in the transaction source component
- Exactly-once or effectively-once?
- Bonus concept: Idempotent operation
- Exactly-once, finally
- State handling code in the system usage analyzer component
- Comparing the delivery semantics again
- Up next ...
- 6 Streaming systems review and a glimpse ahead
- Streaming system pieces
- Parallelization and event grouping
- DAGs and streaming jobs
- Delivery semantics (guarantees)
- Delivery semantics used in the credit card fraud detection system
- Which way to go from here
- Windowed computations
- Joining data in real time
- Backpressure
- Stateless and stateful computations
- Part 2. Stepping up
- 7 Windowed computations
- Slicing up real-time data
- Breaking down the problem in detail
- Breaking down the problem in detail (continued)
- Two different contexts
- Windowing in the fraud detection job
- What exactly are windows?
- Looking closer into the window
- New concept: Windowing strategy
- Fixed windows
- Fixed windows in the windowed proximity analyzer
- Detecting fraud with a fixed time window
- Fixed windows: Time vs. count
- Sliding windows
- Sliding windows: Windowed proximity analyzer
- Detecting fraud with a sliding window
- Session windows
- Session windows (continued)
- Detecting fraud with session windows
- Summary of windowing strategies
- Slicing an event stream into data sets
- Windowing: Concept or implementation
- Another look.
- Key-value store 101
- Implement the windowed proximity analyzer
- Event time and other times for events
- Windowing watermark
- Late events
- 8 Join operations
- Joining emission data on the fly
- The emissions job version 1
- The emission resolver
- Accuracy becomes an issue
- The enhanced emissions job
- Focusing on the join
- What is a join again?
- How the stream join works
- Stream join is a different kind of fan-in
- Vehicle events vs. temperature events
- Table: A materialized view of streaming
- Vehicle events are less efficient to be materialized
- Data integrity quickly became an issue
- What's the problem with this join operator?
- Inner join
- Outer join
- The inner join vs. outer join
- Different types of joins
- Outer joins in streaming systems
- A new issue: Weak connection
- Windowed joins
- Joining two tables instead of joining a stream and table
- Revisiting the materialized view
- 9 Backpressure
- Reliability is critical
- Review the system
- Streamlining streaming jobs
- New concepts: Capacity, utilization, and headroom
- More about utilization and headroom
- New concept: Backpressure
- Measure capacity utilization
- Backpressure in the Streamwork engine
- Backpressure in the Streamwork engine: Propagation
- Our streaming job during a backpressure
- Backpressure in distributed systems
- New concept: Backpressure watermarks
- Another approach to handle lagging instances: Dropping events
- Why do we want to drop events?
- Backpressure could be a symptom when the underlying issue is permanent
- Stopping and resuming may lead to thrashing if the issue is permanent
- Handle thrashing
- 10 Stateful computation
- The migration of the streaming jobs
- Stateful components in the system usage job
- Revisit: State.
- The states in different components
- State data vs. temporary data
- Stateful vs. stateless components: The code
- The stateful source and operator in the system usage job
- States and checkpoints
- Checkpoint creation: Timing is hard
- Event-based timing
- Creating checkpoints with checkpoint events
- A checkpoint event is handled by instance executors
- A checkpoint event flowing through a job
- Creating checkpoints with checkpoint events at the instance level
- Checkpoint event synchronization
- Checkpoint loading and backward compatibility
- Checkpoint storage
- Stateful vs. stateless components
- Manually managed instance states
- Lambda architecture
- 11 Wrap-up: Advanced concepts in streaming systems
- Is this really the end?
- The major window types
- SQL vs. stream joins
- Inner joins vs. outer joins
- Unexpected things can happen in streaming systems
- Backpressure: Slow down sources or upstream components
- Backpressure can be a symptom when the underlying issue is permanent
- Stateful components with checkpoints
- You did it!
- Appendix. Key concepts covered in this book
- index.
- Notes:
- Description based on print version record.
- ISBN:
- 9781638356493
- 1638356491
- OCLC:
- 1308976923
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.