Cargando…

Learning Apache Apex : Real-time streaming applications with Apex /

Designing and writing a real-time streaming publication with Apache Apex About This Book Get a clear, practical approach to real-time data processing Program Apache Apex streaming applications This book shows you Apex integration with the open source Big Data ecosystem Who This Book Is For This book...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Weise, Thomas (Autor), Ramanath, Munagala V. (Autor), Yan, David (Autor), Knowles, Kenneth (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, UK : Packt Publishing, 2017.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cam a2200000 i 4500
001 OR_on1019128795
003 OCoLC
005 20231017213018.0
006 m o d
007 cr unu||||||||
008 180111s2017 enka o 000 0 eng d
040 |a UMI  |b eng  |e rda  |e pn  |c UMI  |d IDEBK  |d TOH  |d NLE  |d STF  |d CEF  |d OCLCF  |d KSU  |d DEBBG  |d UKMGB  |d G3B  |d LVT  |d S9I  |d UAB  |d UKAHL  |d N$T  |d QGK  |d OCLCQ  |d OCLCO  |d OCLCQ 
015 |a GBB820007  |2 bnb 
016 7 |a 018649654  |2 Uk 
020 |a 1788294114 
020 |a 1788296400 
020 |a 9781788296403 
020 |a 9781788294119  |q (electronic bk.) 
029 1 |a GBVCP  |b 1014940109 
029 1 |a UKMGB  |b 018649654 
035 |a (OCoLC)1019128795 
037 |a CL0500000927  |b Safari Books Online 
050 4 |a QA76.9.D343 
082 0 4 |a 005.1  |2 23 
049 |a UAMI 
100 1 |a Weise, Thomas,  |e author. 
245 1 0 |a Learning Apache Apex :  |b Real-time streaming applications with Apex /  |c Thomas Weise, Munagala V. Ramanath, David Yan, Kenneth Knowles. 
264 1 |a Birmingham, UK :  |b Packt Publishing,  |c 2017. 
300 |a 1 online resource (1 volume) :  |b illustrations 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
347 |a data file 
588 0 |a Online resource; title from title page (viewed January 9, 2018). 
505 0 |a Cover -- Title Page -- Copyright -- Credits -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Introduction to Apex -- Unbounded data and continuous processing -- Stream processing -- Stream processing systems -- What is Apex and why is it important? -- Use cases and case studies -- Real-time insights for Advertising Tech (PubMatic) -- Industrial IoT applications (GE) -- Real-time threat detection (Capital One) -- Silver Spring Networks (SSN) -- Application Model and API -- Directed Acyclic Graph (DAG) -- Apex DAG Java API -- High-level Stream Java API -- SQL -- JSON -- Windowing and time -- Value proposition of Apex -- Low latency and stateful processing -- Native streaming versus micro-batch -- Performance -- Where Apex excels -- Where Apex is not suitable -- Summary -- Chapter 2: Getting Started with Application Development -- Development process and methodology -- Setting up the development environment -- Creating a new Maven project -- Application specifications -- Custom operator development -- The Apex operator model -- CheckpointListener/CheckpointNotificationListener -- ActivationListener -- IdleTimeHandler -- Application configuration -- Testing in the IDE -- Writing the integration test -- Running the application on YARN -- Execution layer components -- Installing Apex Docker sandbox -- Running the application -- Working on the cluster -- YARN web UI -- Apex CLI -- Logging -- Dynamically adjusting logging levels -- Summary -- Chapter 3: The Apex Library -- An overview of the library -- Integrations -- Apache Kafka -- Kafka input -- Kafka output -- Other streaming integrations -- JMS (ActiveMQ, SQS, and so on) -- Kinesis streams -- Files -- File input -- File splitter and block reader -- File writer -- Databases -- JDBC input -- JDBC output -- Other databases. 
505 8 |a Transformations -- Parser -- Filter -- Enrichment -- Map transform -- Custom functions -- Windowed transformations -- Windowing -- Global Window -- Time Windows -- Sliding Time Windows -- Session Windows -- Window propagation -- State -- Accumulation -- Accumulation Mode -- State storage -- Watermarks -- Allowed lateness -- Triggering -- Merging of streams -- The windowing example -- Dedup -- Join -- State Management -- Summary -- Chapter 4: Scalability, Low Latency, and Performance -- Partitioning and how it works -- Elasticity -- Partitioning toolkit -- Configuring and triggering partitioning -- StreamCodec -- Unifier -- Custom dynamic partitioning -- Performance optimizations -- Affinity and anti-affinity -- Low-latency versus throughput -- Sample application for dynamic partitioning -- Performance -- other aspects for custom operators -- Summary -- Chapter 5: Fault Tolerance and Reliability -- Distributed systems need to be resilient -- Fault-tolerance components and mechanism in Apex -- Checkpointing -- When to checkpoint -- How to checkpoint -- What to checkpoint -- Incremental state saving -- Incremental recovery -- Processing guarantees -- Example -- exactly-once counting -- The exactly-once output to JDBC -- Summary -- Chapter 6: Example Project -- Real-Time Aggregation and Visualization -- Streaming ETL and beyond -- The application pattern in a real-world use case -- Analyzing Twitter feed -- Top Hashtags -- TweetStats -- Running the application -- Configuring Twitter API access -- Enabling WebSocket output -- The Pub/Sub server -- Grafana visualization -- Installing Grafana -- Installing Grafana Simple JSON Datasource -- The Grafana Pub/Sub adapter server -- Setting up the dashboard -- Summary -- Chapter 7: Example Project -- Real-Time Ride Service Data Processing -- The goal -- Datasource -- The pipeline. 
505 8 |a Simulation of a real-time feed using historical data -- Parsing the data -- Looking up of the zip code and preparing for the windowing operation -- Windowed operator configuration -- Serving the data with WebSocket -- Running the application -- Running the application on GCP Dataproc -- Summary -- Chapter 8: Example Project -- ETL Using SQL -- The application pipeline -- Building and running the application -- Application configuration -- The application code -- Partitioning -- Application testing -- Understanding application logs -- Calcite integration -- Summary -- Chapter 9: Introduction to Apache Beam -- Introduction to Apache Beam -- Beam concepts -- Pipelines, PTransforms, and PCollections -- ParDo -- elementwise computation -- GroupByKey/CombinePerKey -- aggregation across elements -- Windowing, watermarks, and triggering in Beam -- Windowing in Beam -- Watermarks in Beam -- Triggering in Beam -- Advanced topic -- stateful ParDo -- WordCount in Apache Beam -- Setting up your pipeline -- Reading the works of Shakespeare in parallel -- Splitting each line on spaces -- Eliminating empty strings -- Counting the occurrences of each word -- Format your results -- Writing to a sharded text file in parallel -- Testing the pipeline at small scale with DirectRunner -- Running Apache Beam WordCount on Apache Apex -- Summary -- Chapter 10: The Future of Stream Processing -- Lower barrier for building streaming pipelines -- Visual development tools -- Streaming SQL -- Better programming API -- Bridging the gap between data science and engineering -- Machine learning integration -- State management -- State query and data consistency -- Containerized infrastructure -- Management tools -- Summary -- Index. 
520 |a Designing and writing a real-time streaming publication with Apache Apex About This Book Get a clear, practical approach to real-time data processing Program Apache Apex streaming applications This book shows you Apex integration with the open source Big Data ecosystem Who This Book Is For This book assumes knowledge of application development with Java and familiarity with distributed systems. Familiarity with other real-time streaming frameworks is not required, but some practical experience with other big data processing utilities might be helpful. What You Will Learn Put together a functioning Apex application from scratch Scale an Apex application and configure it for optimal performance Understand how to deal with failures via the fault tolerance features of the platform Use Apex via other frameworks such as Beam Understand the DevOps implications of deploying Apex In Detail Apache Apex is a next-generation stream processing framework designed to operate on data at large scale, with minimum latency, maximum reliability, and strict correctness guarantees. Half of the book consists of Apex applications, showing you key aspects of data processing pipelines such as connectors for sources and sinks, and common data transformations. The other half of the book is evenly split into explaining the Apex framework, and tuning, testing, and scaling Apex applications. Much of our economic world depends on growing streams of data, such as social media feeds, financial records, data from mobile devices, sensors and machines (the Internet of Things - IoT). The projects in the book show how to process such streams to gain valuable, timely, and actionable insights. Traditional use cases, such as ETL, that currently consume a significant chunk of data engineering resources are also covered. The final chapter shows you future possibilities emerging in the streaming space, and how Apache Apex can contribute to it. Style and approach This book is divided into two major parts: first it explains what Apex is, what its relevant parts are, and how to write well-built Apex applications. The second part is entirely application-driven, walking you through Apex applications of increasing complexity. 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
590 |a eBooks on EBSCOhost  |b EBSCO eBook Subscription Academic Collection - Worldwide 
630 0 0 |a Apache Apex. 
650 0 |a Data mining. 
650 0 |a Big data. 
650 6 |a Exploration de données (Informatique) 
650 6 |a Données volumineuses. 
650 7 |a COMPUTERS  |x Data Processing.  |2 bisacsh 
650 7 |a COMPUTERS  |x Enterprise Applications  |x General.  |2 bisacsh 
650 7 |a COMPUTERS  |x Systems Architecture  |x Distributed Systems & Computing.  |2 bisacsh 
650 7 |a Big data.  |2 fast  |0 (OCoLC)fst01892965 
650 7 |a Data mining.  |2 fast  |0 (OCoLC)fst00887946 
700 1 |a Ramanath, Munagala V.,  |e author. 
700 1 |a Yan, David,  |e author. 
700 1 |a Knowles, Kenneth,  |e author. 
856 4 0 |u https://learning.oreilly.com/library/view/~/9781788296403/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
938 |a Askews and Holts Library Services  |b ASKH  |n AH33734813 
938 |a EBSCOhost  |b EBSC  |n 1643015 
938 |a ProQuest MyiLibrary Digital eBook Collection  |b IDEB  |n cis39645395 
994 |a 92  |b IZTAP