Cargando…

Streaming systems : the what, where, when, and how of large-scale data processing /

Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scienti...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Akidau, Tyler (Autor), Chernyak, Slava (Autor), Lax, Reuven (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Sebastopol, CA : O'Reilly Media, Inc., [2018]
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • The beam model. Streaming 101
  • The what, where, when, and how of data processing
  • Watermarks
  • Advanced windowing
  • Exactly-once and side effects
  • Streams and tables. The practicalities of persistent state
  • Streaming SQL
  • Streaming joins
  • The evolution of large-scale data processing.
  • Intro; Copyright; Table of Contents; Preface Or: What Are You Getting Yourself Into Here?; Navigating This Book; Takeaways; Conventions Used in This Book; Online Resources; Figures; Code Snippets; O'Reilly Safari; How to Contact Us; Acknowledgments; Part I. The Beam Model; Chapter 1. Streaming 101; Terminology: What Is Streaming?; On the Greatly Exaggerated Limitations of Streaming; Event Time Versus Processing Time; Data Processing Patterns; Bounded Data; Unbounded Data: Batch; Unbounded Data: Streaming; Summary; Chapter 2. The What, Where, When, and How of Data Processing; Roadmap
  • Batch Foundations: What and WhereWhat: Transformations; Where: Windowing; Going Streaming: When and How; When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things!; When: Watermarks; When: Early/On-Time/Late Triggers FTW!; When: Allowed Lateness (i.e., Garbage Collection); How: Accumulation; Summary; Chapter 3. Watermarks; Definition; Source Watermark Creation; Perfect Watermark Creation; Heuristic Watermark Creation; Watermark Propagation; Understanding Watermark Propagation; Watermark Propagation and Output Timestamps; The Tricky Case of Overlapping Windows
  • Percentile WatermarksProcessing-Time Watermarks; Case Studies; Case Study: Watermarks in Google Cloud Dataflow; Case Study: Watermarks in Apache Flink; Case Study: Source Watermarks for Google Cloud Pub/Sub; Summary; Chapter 4. Advanced Windowing; When/Where: Processing-Time Windows; Event-Time Windowing; Processing-Time Windowing via Triggers; Processing-Time Windowing via Ingress Time; Where: Session Windows; Where: Custom Windowing; Variations on Fixed Windows; Variations on Session Windows; One Size Does Not Fit All; Summary; Chapter 5. Exactly-Once and Side Effects
  • Why Exactly Once MattersAccuracy Versus Completeness; Side Effects; Problem Definition; Ensuring Exactly Once in Shuffle; Addressing Determinism; Performance; Graph Optimization; Bloom Filters; Garbage Collection; Exactly Once in Sources; Exactly Once in Sinks; Use Cases; Example Source: Cloud Pub/Sub; Example Sink: Files; Example Sink: Google BigQuery; Other Systems; Apache Spark Streaming; Apache Flink; Summary; Part II. Streams and Tables; Chapter 6. Streams and Tables; Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity
  • Toward a General Theory of Stream and Table RelativityBatch Processing Versus Streams and Tables; A Streams and Tables Analysis of MapReduce; Reconciling with Batch Processing; What, Where, When, and How in a Streams and Tables World; What: Transformations; Where: Windowing; When: Triggers; How: Accumulation; A Holistic View of Streams and Tables in the Beam Model; A General Theory of Stream and Table Relativity; Summary; Chapter 7. The Practicalities of Persistent State; Motivation; The Inevitability of Failure; Correctness and Efficiency; Implicit State; Raw Grouping; Incremental Combining