Cargando…

Data engineering on Azure /

In Data Engineering on Azure you'll learn the skills you need to build and maintain big data platforms in massive enterprises. This invaluable guide includes clear, practical guidance for setting up infrastructure, orchestration, workloads, and governance. As you go, you'll set up efficien...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Riscutia, Vlad (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Shelter Island, NY : Manning, [2021]
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Intro
  • inside front cover
  • Data Platform Architecture
  • Data Engineering on Azure
  • Copyright
  • dedication
  • brief contents
  • contents
  • front matter
  • preface
  • acknowledgments
  • about this book
  • about the author
  • about the cover illustration
  • 1 Introduction
  • 1.1 What is data engineering?
  • 1.2 Who this book is for
  • 1.3 What is a data platform?
  • 1.3.1 Anatomy of a data platform
  • 1.3.2 Infrastructure as code, codeless infrastructure
  • 1.4 Building in the cloud
  • 1.4.1 IaaS, PaaS, SaaS
  • 1.4.2 Network, storage, compute
  • 1.4.3 Getting started with Azure
  • 1.4.4 Interacting with Azure
  • 1.5 Implementing an Azure data platform
  • Summary
  • Part 1 Infrastructure
  • 2 Storage
  • 2.1 Storing data in a data platform
  • 2.1.1 Storing data across multiple data fabrics
  • 2.1.2 Having a single source of truth
  • 2.2 Introducing Azure Data Explorer
  • 2.2.1 Deploying an Azure Data Explorer cluster
  • 2.2.2 Using Azure Data Explorer
  • 2.2.3 Working around query limits
  • 2.3 Introducing Azure Data Lake Storage
  • 2.3.1 Creating an Azure Data Lake Storage account
  • 2.3.2 Using Azure Data Lake Storage
  • 2.3.3 Integrating with Azure Data Explorer
  • 2.4 Ingesting data
  • 2.4.1 Ingestion frequency
  • 2.4.2 Load type
  • 2.4.3 Restatements and reloads
  • Summary
  • 3 DevOps
  • 3.1 What is DevOps?
  • 3.1.1 DevOps in data engineering
  • 3.2 Introducing Azure DevOps
  • 3.2.1 Using the az azure-devops extension
  • 3.3 Deploying infrastructure
  • 3.3.1 Exporting an Azure Resource Manager template
  • 3.3.2 Creating Azure DevOps service connections
  • 3.3.3 Deploying Azure Resource Manager templates
  • 3.3.4 Understanding Azure Pipelines
  • 3.4 Deploying analytics
  • 3.4.1 Using Azure DevOps marketplace extensions
  • 3.4.2 Storing everything in Git
  • Deploying everything automatically
  • Summary
  • 4 Orchestration
  • 4.1 Ingesting the Bing COVID-19 open dataset
  • 4.2 Introducing Azure Data Factory
  • 4.2.1 Setting up the data source
  • 4.2.2 Setting up the data sink
  • 4.2.3 Setting up the pipeline
  • 4.2.4 Setting up a trigger
  • 4.2.5 Orchestrating with Azure Data Factory
  • 4.3 DevOps for Azure Data Factory
  • 4.3.1 Deploying Azure Data Factory from Git
  • 4.3.2 Setting up access control
  • 4.3.3 Deploying the production data factory
  • 4.3.4 DevOps for the Azure Data Factory recap
  • 4.4 Monitoring with Azure Monitor
  • Summary
  • Part 2 Workloads
  • 5 Processing
  • 5.1 Data modeling techniques
  • 5.1.1 Normalization and denormalization
  • 5.1.2 Data warehousing
  • 5.1.3 Semistructured data
  • 5.1.4 Data modeling recap
  • 5.2 Identity keyrings
  • 5.2.1 Building an identity keyring
  • 5.2.2 Understanding keyrings
  • 5.3 Timelines
  • 5.3.1 Building a timeline view
  • 5.3.2 Using timelines
  • 5.4 Continuous data processing
  • 5.4.1 Tracking processing functions in Git
  • 5.4.2 Keyring building in Azure Data Factory
  • 5.4.3 Scaling out
  • Summary
  • 6 Analytics
  • 6.1 Structuring storage
  • 6.1.1 Providing development data