Cargando…

Learning Apache Apex : Real-time streaming applications with Apex /

Designing and writing a real-time streaming publication with Apache Apex About This Book Get a clear, practical approach to real-time data processing Program Apache Apex streaming applications This book shows you Apex integration with the open source Big Data ecosystem Who This Book Is For This book...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autores principales:	Weise, Thomas (Autor), Ramanath, Munagala V. (Autor), Yan, David (Autor), Knowles, Kenneth (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Birmingham, UK : Packt Publishing, 2017.
Temas:	Apache Apex. Data mining. Big data. Exploration de données (Informatique) Données volumineuses. COMPUTERS > Data Processing. COMPUTERS > Enterprise Applications > General. COMPUTERS > Systems Architecture > Distributed Systems & Computing.
Acceso en línea:	Texto completo Texto completo

Tabla de Contenidos:

Cover
Title Page
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Introduction to Apex
Unbounded data and continuous processing
Stream processing
Stream processing systems
What is Apex and why is it important?
Use cases and case studies
Real-time insights for Advertising Tech (PubMatic)
Industrial IoT applications (GE)
Real-time threat detection (Capital One)
Silver Spring Networks (SSN)
Application Model and API
Directed Acyclic Graph (DAG)
Apex DAG Java API
High-level Stream Java API
SQL
JSON
Windowing and time
Value proposition of Apex
Low latency and stateful processing
Native streaming versus micro-batch
Performance
Where Apex excels
Where Apex is not suitable
Summary
Chapter 2: Getting Started with Application Development
Development process and methodology
Setting up the development environment
Creating a new Maven project
Application specifications
Custom operator development
The Apex operator model
CheckpointListener/CheckpointNotificationListener
ActivationListener
IdleTimeHandler
Application configuration
Testing in the IDE
Writing the integration test
Running the application on YARN
Execution layer components
Installing Apex Docker sandbox
Running the application
Working on the cluster
YARN web UI
Apex CLI
Logging
Dynamically adjusting logging levels
Summary
Chapter 3: The Apex Library
An overview of the library
Integrations
Apache Kafka
Kafka input
Kafka output
Other streaming integrations
JMS (ActiveMQ, SQS, and so on)
Kinesis streams
Files
File input
File splitter and block reader
File writer
Databases
JDBC input
JDBC output
Other databases.
Transformations
Parser
Filter
Enrichment
Map transform
Custom functions
Windowed transformations
Windowing
Global Window
Time Windows
Sliding Time Windows
Session Windows
Window propagation
State
Accumulation
Accumulation Mode
State storage
Watermarks
Allowed lateness
Triggering
Merging of streams
The windowing example
Dedup
Join
State Management
Summary
Chapter 4: Scalability, Low Latency, and Performance
Partitioning and how it works
Elasticity
Partitioning toolkit
Configuring and triggering partitioning
StreamCodec
Unifier
Custom dynamic partitioning
Performance optimizations
Affinity and anti-affinity
Low-latency versus throughput
Sample application for dynamic partitioning
Performance
other aspects for custom operators
Summary
Chapter 5: Fault Tolerance and Reliability
Distributed systems need to be resilient
Fault-tolerance components and mechanism in Apex
Checkpointing
When to checkpoint
How to checkpoint
What to checkpoint
Incremental state saving
Incremental recovery
Processing guarantees
Example
exactly-once counting
The exactly-once output to JDBC
Summary
Chapter 6: Example Project
Real-Time Aggregation and Visualization
Streaming ETL and beyond
The application pattern in a real-world use case
Analyzing Twitter feed
Top Hashtags
TweetStats
Running the application
Configuring Twitter API access
Enabling WebSocket output
The Pub/Sub server
Grafana visualization
Installing Grafana
Installing Grafana Simple JSON Datasource
The Grafana Pub/Sub adapter server
Setting up the dashboard
Summary
Chapter 7: Example Project
Real-Time Ride Service Data Processing
The goal
Datasource
The pipeline.
Simulation of a real-time feed using historical data
Parsing the data
Looking up of the zip code and preparing for the windowing operation
Windowed operator configuration
Serving the data with WebSocket
Running the application
Running the application on GCP Dataproc
Summary
Chapter 8: Example Project
ETL Using SQL
The application pipeline
Building and running the application
Application configuration
The application code
Partitioning
Application testing
Understanding application logs
Calcite integration
Summary
Chapter 9: Introduction to Apache Beam
Introduction to Apache Beam
Beam concepts
Pipelines, PTransforms, and PCollections
ParDo
elementwise computation
GroupByKey/CombinePerKey
aggregation across elements
Windowing, watermarks, and triggering in Beam
Windowing in Beam
Watermarks in Beam
Triggering in Beam
Advanced topic
stateful ParDo
WordCount in Apache Beam
Setting up your pipeline
Reading the works of Shakespeare in parallel
Splitting each line on spaces
Eliminating empty strings
Counting the occurrences of each word
Format your results
Writing to a sharded text file in parallel
Testing the pipeline at small scale with DirectRunner
Running Apache Beam WordCount on Apache Apex
Summary
Chapter 10: The Future of Stream Processing
Lower barrier for building streaming pipelines
Visual development tools
Streaming SQL
Better programming API
Bridging the gap between data science and engineering
Machine learning integration
State management
State query and data consistency
Containerized infrastructure
Management tools
Summary
Index.

Learning Apache Apex : Real-time streaming applications with Apex /

Ejemplares similares