Cargando…

Building ETL Pipelines with Python Create and Deploy Enterprise-Ready ETL Pipelines by Employing Modern Methods /

Develop production-ready ETL pipelines by leveraging Python libraries and deploying them for suitable use cases Key Features Understand how to set up a Python virtual environment with PyCharm Learn functional and object-oriented approaches to create ETL pipelines Create robust CI/CD processes for ET...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Pandey, Brij Kishore (Autor), Schoof, Emily Ro (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, Limited, 2023.
Edición:1st edition.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright
  • Dedication
  • Contributors
  • Table of Contents
  • Preface
  • Part 1: Introduction to ETL, Data Pipelines, and Design Principles
  • Chapter 1: A Primer on Python and the Development Environment
  • Introducing Python fundamentals
  • An overview of Python data structures
  • Python if...else conditions or conditional statements
  • Python looping techniques
  • Python functions
  • Object-oriented programming with Python
  • Working with files in Python
  • Establishing a development environment
  • Version control with Git tracking
  • Documenting environment dependencies with requirements.txt
  • Utilizing module management systems (MMSs)
  • Configuring a Pipenv environment in PyCharm
  • Summary
  • Chapter 2: Understanding the ETL Process and Data Pipelines
  • What is a data pipeline?
  • How do we create a robust pipeline?
  • Pre-work
  • understanding your data
  • Design planning
  • planning your workflow
  • Architecture development
  • developing your resources
  • Putting it all together
  • project diagrams
  • What is an ETL data pipeline?
  • Batch processing
  • Streaming method
  • Cloud-native
  • Automating ETL pipelines
  • Exploring use cases for ETL pipelines
  • Summary
  • References
  • Chapter 3: Design Principles for Creating Scalable and Resilient Pipelines
  • Technical requirements
  • Understanding the design patterns for ETL
  • Basic ETL design pattern
  • ETL-P design pattern
  • ETL-VP design pattern
  • ELT two-phase pattern
  • Preparing your local environment for installations
  • Open source Python libraries for ETL pipelines
  • Pandas
  • NumPy
  • Scaling for big data packages
  • Dask
  • Numba
  • Summary
  • References
  • Part 2: Designing ETL Pipelines with Python
  • Chapter 4: Sourcing Insightful Data and Data Extraction Strategies
  • Technical requirements
  • What is data sourcing?
  • Accessibility to data
  • Types of data sources
  • Getting started with data extraction
  • CSV and Excel data files
  • Parquet data files
  • API connections
  • Databases
  • Data from web pages
  • Creating a data extraction pipeline using Python
  • Data extraction
  • Logging
  • Summary
  • References
  • Chapter 5: Data Cleansing and Transformation
  • Technical requirements
  • Scrubbing your data
  • Data transformation
  • Data cleansing and transformation in ETL pipelines
  • Understanding the downstream applications of your data
  • Strategies for data cleansing and transformation in Python
  • Preliminary tasks
  • the importance of staging data
  • Transformation activities in Python
  • Creating data pipeline activity in Python
  • Summary
  • Chapter 6: Loading Transformed Data
  • Technical requirements
  • Introduction to data loading
  • Choosing the load destination
  • Types of load destinations
  • Best practices for data loading
  • Optimizing data loading activities by controlling the data import method
  • Creating demo data
  • Full data loads
  • Incremental data loads