Cargando…

Data Pipelines with Apache Airflow.

Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You'll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practica...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Ruiter, Julian de
Otros Autores: Harenslak, Bas
Formato: Electrónico eBook
Idioma:Inglés
Publicado: [Place of publication not identified] : Simon & Schuster : Manning, 2021.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cam a22000003i 4500
001 OR_on1308508409
003 OCoLC
005 20231017213018.0
006 m o d
007 cr cnu|||unuuu
008 220401s2021 xx o 000 0 eng d
040 |a N$T  |b eng  |e rda  |e pn  |c N$T  |d YDX  |d EBLCP  |d TOH  |d AU@  |d VT2  |d OCLCF  |d DST  |d UKAHL  |d OCLCO  |d OCLCQ 
019 |a 1256713426  |a 1256804372  |a 1272923830  |a 1281717485 
020 |a 9781638356837  |q (electronic bk.) 
020 |a 1638356831  |q (electronic bk.) 
020 |z 9781617296901 
020 |z 1617296902 
024 8 |a 9781617296901 
029 1 |a AU@  |b 000069347134 
029 1 |a AU@  |b 000071968419 
035 |a (OCoLC)1308508409  |z (OCoLC)1256713426  |z (OCoLC)1256804372  |z (OCoLC)1272923830  |z (OCoLC)1281717485 
050 4 |a QA76.9.D343  |b .H374 2021 
082 0 4 |a 006.3/12  |2 23 
049 |a UAMI 
100 1 |a Ruiter, Julian de. 
245 1 0 |a Data Pipelines with Apache Airflow. 
264 1 |a [Place of publication not identified] :  |b Simon & Schuster :  |b Manning,  |c 2021. 
300 |a 1 online resource 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
347 |a text file 
588 0 |a Vendor-supplied metadata. 
505 0 |a Intro -- inside front cover -- Data Pipelines with Apache Airflow -- Copyright -- brief contents -- contents -- front matter -- preface -- acknowledgments -- Bas Harenslak -- Julian de Ruiter -- about this book -- Who should read this book -- How this book is organized: A road map -- About the code -- LiveBook discussion forum -- about the authors -- about the cover illustration -- Part 1. Getting started -- 1 Meet Apache Airflow -- 1.1 Introducing data pipelines -- 1.1.1 Data pipelines as graphs -- 1.1.2 Executing a pipeline graph -- 1.1.3 Pipeline graphs vs. sequential scripts 
505 8 |a 1.1.4 Running pipeline using workflow managers -- 1.2 Introducing Airflow -- 1.2.1 Defining pipelines flexibly in (Python) code -- 1.2.2 Scheduling and executing pipelines -- 1.2.3 Monitoring and handling failures -- 1.2.4 Incremental loading and backfilling -- 1.3 When to use Airflow -- 1.3.1 Reasons to choose Airflow -- 1.3.2 Reasons not to choose Airflow -- 1.4 The rest of this book -- Summary -- 2 Anatomy of an Airflow DAG -- 2.1 Collecting data from numerous sources -- 2.1.1 Exploring the data -- 2.2 Writing your first Airflow DAG -- 2.2.1 Tasks vs. operators 
505 8 |a 2.2.2 Running arbitrary Python code -- 2.3 Running a DAG in Airflow -- 2.3.1 Running Airflow in a Python environment -- 2.3.2 Running Airflow in Docker containers -- 2.3.3 Inspecting the Airflow UI -- 2.4 Running at regular intervals -- 2.5 Handling failing tasks -- Summary -- 3 Scheduling in Airflow -- 3.1 An example: Processing user events -- 3.2 Running at regular intervals -- 3.2.1 Defining scheduling intervals -- 3.2.2 Cron-based intervals -- 3.2.3 Frequency-based intervals -- 3.3 Processing data incrementally -- 3.3.1 Fetching events incrementally 
505 8 |a 3.3.2 Dynamic time references using execution dates -- 3.3.3 Partitioning your data -- 3.4 Understanding Airflow's execution dates -- 3.4.1 Executing work in fixed-length intervals -- 3.5 Using backfilling to fill in past gaps -- 3.5.1 Executing work back in time -- 3.6 Best practices for designing tasks -- 3.6.1 Atomicity -- 3.6.2 Idempotency -- Summary -- 4 Templating tasks using the Airflow context -- 4.1 Inspecting data for processing with Airflow -- 4.1.1 Determining how to load incremental data -- 4.2 Task context and Jinja templating -- 4.2.1 Templating operator arguments 
505 8 |a 4.2.2 What is available for templating? -- 4.2.3 Templating the PythonOperator -- 4.2.4 Providing variables to the PythonOperator -- 4.2.5 Inspecting templated arguments -- 4.3 Hooking up other systems -- Summary -- 5 Defining dependencies between tasks -- 5.1 Basic dependencies -- 5.1.1 Linear dependencies -- 5.1.2 Fan-in/-out dependencies -- 5.2 Branching -- 5.2.1 Branching within tasks -- 5.2.2 Branching within the DAG -- 5.3 Conditional tasks -- 5.3.1 Conditions within tasks -- 5.3.2 Making tasks conditional -- 5.3.3 Using built-in operators -- 5.4 More about trigger rules 
520 |a Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You'll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline's needs. 
542 |f © 2021 Manning Publications Co. All rights reserved.  |g 2021 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
650 0 |a Data mining. 
650 0 |a Cloud computing. 
650 0 |a Programming languages (Electronic computers) 
650 0 |a Python (Computer program language) 
650 0 |a Big data. 
650 0 |a Machine learning. 
650 0 |a Electronic data processing. 
650 0 |a Information storage and retrieval systems  |x Scalability. 
650 0 |a Application program interfaces (Computer software) 
650 2 |a Data Mining 
650 6 |a Exploration de données (Informatique) 
650 6 |a Infonuagique. 
650 6 |a Python (Langage de programmation) 
650 6 |a Données volumineuses. 
650 6 |a Apprentissage automatique. 
650 6 |a Interfaces de programmation d'applications. 
650 7 |a APIs (interfaces)  |2 aat 
650 7 |a Application program interfaces (Computer software)  |2 fast  |0 (OCoLC)fst00811704 
650 7 |a Big data.  |2 fast  |0 (OCoLC)fst01892965 
650 7 |a Cloud computing.  |2 fast  |0 (OCoLC)fst01745899 
650 7 |a Data mining.  |2 fast  |0 (OCoLC)fst00887946 
650 7 |a Electronic data processing.  |2 fast  |0 (OCoLC)fst00906956 
650 7 |a Information storage and retrieval systems  |x Scalability.  |2 fast  |0 (OCoLC)fst01921149 
650 7 |a Machine learning.  |2 fast  |0 (OCoLC)fst01004795 
650 7 |a Programming languages (Electronic computers)  |2 fast  |0 (OCoLC)fst01078704 
650 7 |a Python (Computer program language)  |2 fast  |0 (OCoLC)fst01084736 
700 1 |a Harenslak, Bas. 
776 0 8 |i Print version:  |a Ruiter, Julian de.  |t Data Pipelines with Apache Airflow.  |d [Place of publication not identified] : Simon & Schuster : Manning, 2021  |z 9781617296901  |z 1617296902  |w (OCoLC)1249108869 
856 4 0 |u https://learning.oreilly.com/library/view/~/9781617296901/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
938 |a Askews and Holts Library Services  |b ASKH  |n AH39609424 
938 |a ProQuest Ebook Central  |b EBLB  |n EBL6642618 
938 |a EBSCOhost  |b EBSC  |n 2949094 
938 |a YBP Library Services  |b YANK  |n 302273010 
994 |a 92  |b IZTAP