Building ETL Pipelines with Python Create and Deploy Enterprise-Ready ETL Pipelines by Employing Modern Methods /
Develop production-ready ETL pipelines by leveraging Python libraries and deploying them for suitable use cases Key Features Understand how to set up a Python virtual environment with PyCharm Learn functional and object-oriented approaches to create ETL pipelines Create robust CI/CD processes for ET...
Clasificación: | Libro Electrónico |
---|---|
Autores principales: | , |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing, Limited,
2023.
|
Edición: | 1st edition. |
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright
- Dedication
- Contributors
- Table of Contents
- Preface
- Part 1: Introduction to ETL, Data Pipelines, and Design Principles
- Chapter 1: A Primer on Python and the Development Environment
- Introducing Python fundamentals
- An overview of Python data structures
- Python if...else conditions or conditional statements
- Python looping techniques
- Python functions
- Object-oriented programming with Python
- Working with files in Python
- Establishing a development environment
- Version control with Git tracking
- Documenting environment dependencies with requirements.txt
- Utilizing module management systems (MMSs)
- Configuring a Pipenv environment in PyCharm
- Summary
- Chapter 2: Understanding the ETL Process and Data Pipelines
- What is a data pipeline?
- How do we create a robust pipeline?
- Pre-work
- understanding your data
- Design planning
- planning your workflow
- Architecture development
- developing your resources
- Putting it all together
- project diagrams
- What is an ETL data pipeline?
- Batch processing
- Streaming method
- Cloud-native
- Automating ETL pipelines
- Exploring use cases for ETL pipelines
- Summary
- References
- Chapter 3: Design Principles for Creating Scalable and Resilient Pipelines
- Technical requirements
- Understanding the design patterns for ETL
- Basic ETL design pattern
- ETL-P design pattern
- ETL-VP design pattern
- ELT two-phase pattern
- Preparing your local environment for installations
- Open source Python libraries for ETL pipelines
- Pandas
- NumPy
- Scaling for big data packages
- Dask
- Numba
- Summary
- References
- Part 2: Designing ETL Pipelines with Python
- Chapter 4: Sourcing Insightful Data and Data Extraction Strategies
- Technical requirements
- What is data sourcing?
- Accessibility to data
- Types of data sources
- Getting started with data extraction
- CSV and Excel data files
- Parquet data files
- API connections
- Databases
- Data from web pages
- Creating a data extraction pipeline using Python
- Data extraction
- Logging
- Summary
- References
- Chapter 5: Data Cleansing and Transformation
- Technical requirements
- Scrubbing your data
- Data transformation
- Data cleansing and transformation in ETL pipelines
- Understanding the downstream applications of your data
- Strategies for data cleansing and transformation in Python
- Preliminary tasks
- the importance of staging data
- Transformation activities in Python
- Creating data pipeline activity in Python
- Summary
- Chapter 6: Loading Transformed Data
- Technical requirements
- Introduction to data loading
- Choosing the load destination
- Types of load destinations
- Best practices for data loading
- Optimizing data loading activities by controlling the data import method
- Creating demo data
- Full data loads
- Incremental data loads