Cargando…

MODERN DATA ARCHITECTURES WITH PYTHON a practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python /

Build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka Key Features Develop modern data skills used in emerging technologies Learn pragmatic design methodologies such as Data Mesh and data lakehouses Gain a deeper understanding of data governance Purchase of the pri...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Lipp, Brian (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Birmingham, UK : Packt Publishing Ltd., 2023.
Edición:	1st edition.
Temas:	Python (Computer program language) Data structures (Computer science) Big data.
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

MARC


LEADER	00000cam a22000007a 4500
001	OR_on1398279353
003	OCoLC
005	20231017213018.0
006	m o d
007	cr \|n\|\|\|\|\|\|\|\|\|
008	230921s2023 enk o 001 0 eng d
040			\|a YDX \|b eng \|c YDX \|d OCLKB \|d EBLCP \|d OCLCO \|d OCLKB \|d OCLKB \|d ORMDA
019			\|a 1398243786
020			\|a 9781801076418 \|q (electronic bk.)
020			\|a 1801076413 \|q (electronic bk.)
020			\|z 1801070490
020			\|z 9781801070492
035			\|a (OCoLC)1398279353 \|z (OCoLC)1398243786
037			\|a 9781801070492 \|b O'Reilly Media
050		4	\|a QA76.73.P98
082	0	4	\|a 005.13/3 \|2 23/eng/20231010
049			\|a UAMI
100	1		\|a Lipp, Brian, \|e author.
245	1	0	\|a MODERN DATA ARCHITECTURES WITH PYTHON \|h [electronic resource] : \|b a practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python / \|c Brian Lipp.
250			\|a 1st edition.
260			\|a Birmingham, UK : \|b Packt Publishing Ltd., \|c 2023.
300			\|a 1 online resource
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
500			\|a Includes index.
505	0		\|a Cover -- Title Page -- Copyright and Credits -- Dedications -- Contributors -- Table of Contents -- Preface -- Part 1: Fundamental Data Knowledge -- Chapter 1: Modern Data Processing Architecture -- Technical requirements -- Databases, data warehouses, and data lakes -- OLTP -- OLAP -- Data lakes -- Event stores -- File formats -- Data platform architecture at a high level -- Comparing the Lambda and Kappa architectures -- Lambda architecture -- Kappa architecture -- Lakehouse and Delta architectures -- Lakehouses -- The seven central tenets
505	8		\|a The medallion data pattern and the Delta architecture -- Data mesh theory and practice -- Defining terms -- The four principles of data mesh -- Summary -- Practical lab -- Solution -- Chapter 2: Understanding Data Analytics -- Technical requirements -- Setting up your environment -- Python -- venv -- Graphviz -- Workflow initialization -- Cleaning and preparing your data -- Duplicate values -- Working with nulls -- Using RegEx -- Outlier identification -- Casting columns -- Fixing column names -- Complex data types -- Data documentation -- diagrams -- Data lineage graphs -- Data modeling patterns
505	8		\|a Relational -- Dimensional modeling -- Key terms -- OBT -- Practical lab -- Loading the problem data -- Solution -- Summary -- Part 2: Data Engineering Toolset -- Chapter 3: Apache Spark Deep Dive -- Technical requirements -- Setting up your environment -- Python, AWS, and Databricks -- Databricks CLI -- Cloud data storage -- Object storage -- Relational -- NoSQL -- Spark architecture -- Introduction to Apache Spark -- Key components -- Working with partitions -- Shuffling partitions -- Caching -- Broadcasting -- Job creation pipeline -- Delta Lake -- Transaction log
505	8		\|a Grouping tables with databases -- Table -- Adding speed with Z-ordering -- Bloom filters -- Practical lab -- Problem 1 -- Problem 2 -- Problem 3 -- Solution -- Summary -- Chapter 4: Batch and Stream Data Processing Using PySpark -- Technical requirements -- Setting up your environment -- Python, AWS, and Databricks -- Databricks CLI -- Batch processing -- Partitioning -- Data skew -- Reading data -- Spark schemas -- Making decisions -- Removing unwanted columns -- Working with data in groups -- The UDF -- Stream processing -- Reading from disk -- Debugging -- Writing to disk
505	8		\|a Batch stream hybrid -- Delta streaming -- Batch processing in a stream -- Practical lab -- Setup -- Creating fake data -- Problem 1 -- Problem 2 -- Problem 3 -- Solution -- Solution 1 -- Solution 2 -- Solution 3 -- Summary -- Chapter 5: Streaming Data with Kafka -- Technical requirements -- Setting up your environment -- Python, AWS, and Databricks -- Databricks CLI -- Confluent Kafka -- Signing up -- Kafka architecture -- Topics -- Partitions -- Brokers -- Producers -- Consumers -- Schema Registry -- Kafka Connect -- Spark and Kafka -- Practical lab -- Solution -- Summary
520			\|a Build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka Key Features Develop modern data skills used in emerging technologies Learn pragmatic design methodologies such as Data Mesh and data lakehouses Gain a deeper understanding of data governance Purchase of the print or Kindle book includes a free PDF eBook Book Description Modern Data Architectures with Python will teach you how to seamlessly incorporate your machine learning and data science work streams into your open data platforms. You'll learn how to take your data and create open lakehouses that work with any technology using tried-and-true techniques, including the medallion architecture and Delta Lake. Starting with the fundamentals, this book will help you build pipelines on Databricks, an open data platform, using SQL and Python. You'll gain an understanding of notebooks and applications written in Python using standard software engineering tools such as git, pre-commit, Jenkins, and Github. Next, you'll delve into streaming and batch-based data processing using Apache Spark and Confluent Kafka. As you advance, you'll learn how to deploy your resources using infrastructure as code and how to automate your workflows and code development. Since any data platform's ability to handle and work with AI and ML is a vital component, you'll also explore the basics of ML and how to work with modern MLOps tooling. Finally, you'll get hands-on experience with Apache Spark, one of the key data technologies in today's market. By the end of this book, you'll have amassed a wealth of practical and theoretical knowledge to build, manage, orchestrate, and architect your data ecosystems. What you will learn Understand data patterns including delta architecture Discover how to increase performance with Spark internals Find out how to design critical data diagrams Explore MLOps with tools such as AutoML and MLflow Get to grips with building data products in a data mesh Discover data governance and build confidence in your data Introduce data visualizations and dashboards into your data practice Who this book is for This book is for developers, analytics engineers, and managers looking to further develop a data ecosystem within their organization. While they're not prerequisites, basic knowledge of Python and prior experience with data will help you to read and follow along with the examples.
590			\|a O'Reilly \|b O'Reilly Online Learning: Academic/Public Library Edition
650		0	\|a Python (Computer program language)
650		0	\|a Data structures (Computer science)
650		0	\|a Big data.
776	0	8	\|i Print version: \|z 1801070490 \|z 9781801070492 \|w (OCoLC)1396693127
856	4	0	\|u https://learning.oreilly.com/library/view/~/9781801070492/?ar \|z Texto completo (Requiere registro previo con correo institucional)
938			\|a YBP Library Services \|b YANK \|n 20497915
938			\|b OCKB \|z pqebk.perpetual,1caf4064-7041-41d0-8c33-fd828b7909bc-emi
994			\|a 92 \|b IZTAP

MODERN DATA ARCHITECTURES WITH PYTHON a practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python /

MARC

Ejemplares similares