Learning Apache Apex : Real-time streaming applications with Apex /
Designing and writing a real-time streaming publication with Apache Apex About This Book Get a clear, practical approach to real-time data processing Program Apache Apex streaming applications This book shows you Apex integration with the open source Big Data ecosystem Who This Book Is For This book...
Clasificación: | Libro Electrónico |
---|---|
Autores principales: | , , , |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, UK :
Packt Publishing,
2017.
|
Temas: | |
Acceso en línea: | Texto completo Texto completo |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Introduction to Apex
- Unbounded data and continuous processing
- Stream processing
- Stream processing systems
- What is Apex and why is it important?
- Use cases and case studies
- Real-time insights for Advertising Tech (PubMatic)
- Industrial IoT applications (GE)
- Real-time threat detection (Capital One)
- Silver Spring Networks (SSN)
- Application Model and API
- Directed Acyclic Graph (DAG)
- Apex DAG Java API
- High-level Stream Java API
- SQL
- JSON
- Windowing and time
- Value proposition of Apex
- Low latency and stateful processing
- Native streaming versus micro-batch
- Performance
- Where Apex excels
- Where Apex is not suitable
- Summary
- Chapter 2: Getting Started with Application Development
- Development process and methodology
- Setting up the development environment
- Creating a new Maven project
- Application specifications
- Custom operator development
- The Apex operator model
- CheckpointListener/CheckpointNotificationListener
- ActivationListener
- IdleTimeHandler
- Application configuration
- Testing in the IDE
- Writing the integration test
- Running the application on YARN
- Execution layer components
- Installing Apex Docker sandbox
- Running the application
- Working on the cluster
- YARN web UI
- Apex CLI
- Logging
- Dynamically adjusting logging levels
- Summary
- Chapter 3: The Apex Library
- An overview of the library
- Integrations
- Apache Kafka
- Kafka input
- Kafka output
- Other streaming integrations
- JMS (ActiveMQ, SQS, and so on)
- Kinesis streams
- Files
- File input
- File splitter and block reader
- File writer
- Databases
- JDBC input
- JDBC output
- Other databases.
- Transformations
- Parser
- Filter
- Enrichment
- Map transform
- Custom functions
- Windowed transformations
- Windowing
- Global Window
- Time Windows
- Sliding Time Windows
- Session Windows
- Window propagation
- State
- Accumulation
- Accumulation Mode
- State storage
- Watermarks
- Allowed lateness
- Triggering
- Merging of streams
- The windowing example
- Dedup
- Join
- State Management
- Summary
- Chapter 4: Scalability, Low Latency, and Performance
- Partitioning and how it works
- Elasticity
- Partitioning toolkit
- Configuring and triggering partitioning
- StreamCodec
- Unifier
- Custom dynamic partitioning
- Performance optimizations
- Affinity and anti-affinity
- Low-latency versus throughput
- Sample application for dynamic partitioning
- Performance
- other aspects for custom operators
- Summary
- Chapter 5: Fault Tolerance and Reliability
- Distributed systems need to be resilient
- Fault-tolerance components and mechanism in Apex
- Checkpointing
- When to checkpoint
- How to checkpoint
- What to checkpoint
- Incremental state saving
- Incremental recovery
- Processing guarantees
- Example
- exactly-once counting
- The exactly-once output to JDBC
- Summary
- Chapter 6: Example Project
- Real-Time Aggregation and Visualization
- Streaming ETL and beyond
- The application pattern in a real-world use case
- Analyzing Twitter feed
- Top Hashtags
- TweetStats
- Running the application
- Configuring Twitter API access
- Enabling WebSocket output
- The Pub/Sub server
- Grafana visualization
- Installing Grafana
- Installing Grafana Simple JSON Datasource
- The Grafana Pub/Sub adapter server
- Setting up the dashboard
- Summary
- Chapter 7: Example Project
- Real-Time Ride Service Data Processing
- The goal
- Datasource
- The pipeline.
- Simulation of a real-time feed using historical data
- Parsing the data
- Looking up of the zip code and preparing for the windowing operation
- Windowed operator configuration
- Serving the data with WebSocket
- Running the application
- Running the application on GCP Dataproc
- Summary
- Chapter 8: Example Project
- ETL Using SQL
- The application pipeline
- Building and running the application
- Application configuration
- The application code
- Partitioning
- Application testing
- Understanding application logs
- Calcite integration
- Summary
- Chapter 9: Introduction to Apache Beam
- Introduction to Apache Beam
- Beam concepts
- Pipelines, PTransforms, and PCollections
- ParDo
- elementwise computation
- GroupByKey/CombinePerKey
- aggregation across elements
- Windowing, watermarks, and triggering in Beam
- Windowing in Beam
- Watermarks in Beam
- Triggering in Beam
- Advanced topic
- stateful ParDo
- WordCount in Apache Beam
- Setting up your pipeline
- Reading the works of Shakespeare in parallel
- Splitting each line on spaces
- Eliminating empty strings
- Counting the occurrences of each word
- Format your results
- Writing to a sharded text file in parallel
- Testing the pipeline at small scale with DirectRunner
- Running Apache Beam WordCount on Apache Apex
- Summary
- Chapter 10: The Future of Stream Processing
- Lower barrier for building streaming pipelines
- Visual development tools
- Streaming SQL
- Better programming API
- Bridging the gap between data science and engineering
- Machine learning integration
- State management
- State query and data consistency
- Containerized infrastructure
- Management tools
- Summary
- Index.