Cargando…

BigQuery for data warehousing : managed data analysis in the Google cloud /

Create a data warehouse, complete with reporting and dashboards using Google's BigQuery technology. This book takes you from the basic concepts of data warehousing through the design, build, load, and maintenance phases. You will build capabilities to capture data from the operational environme...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Mucchetti, Mark
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Santa Monica, CA : APress, [2020]
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Intro
  • Table of Contents
  • About the Author
  • About the Technical Reviewer
  • Acknowledgments
  • Introduction
  • Part I: Building a Warehouse
  • Chapter 1: Settling into BigQuery
  • Getting Started with GCP
  • Beginning with GCP
  • Using Google Cloud Platform
  • The Cloud Console
  • The Command-Line Interface
  • Programmatic Access
  • BigQuery in the Cloud Console
  • Querying
  • Tables
  • Aliasing
  • Commenting
  • SELECT
  • FROM
  • WHERE
  • GROUP BY
  • ORDER BY
  • LIMIT
  • Additional Things to Try
  • Shortcuts
  • Statement Batches
  • Query History
  • Saving Queries and Views
  • Scheduled Queries
  • Designing Your Warehouse
  • Google BigQuery As a Data Store
  • Row-Oriented Approach
  • Column-Oriented Approach
  • Google BigQuery As a Data Warehouse
  • Key Questions
  • Fundamentals
  • What Problem Am I Trying to Solve?
  • What Is the Scope of This Problem?
  • Who Will Be the Primary Users of Your Warehouse?
  • Are You Replacing Something That Exists Already?
  • Thinking About Scale
  • How Much Data Do I Have Today?
  • How Quickly Will My Data Increase in Size?
  • How Many Readers Am I Going to Have?
  • How Many Analysts Am I Going to Have?
  • What Is My Budget?
  • Do I Need to Account for Real-Time Data?
  • Data Normalization
  • Summary
  • Chapter 2: Starting Your Warehouse Project
  • Where to Start
  • Key Questions
  • What Are My Finite Resources?
  • What Is My Business Domain?
  • What Differentiates My Business from Others in Its Domain?
  • Who Knows What Data I Need?
  • Who Already Knows What Data They Need?
  • What Are My Key Entities?
  • What Are My Key Relationships?
  • What Role Does Time Play in My Model?
  • What Role Does Cost Play in My Model?
  • General Considerations
  • Making the Case
  • Interviewing Stakeholders
  • Resolving Conflicts
  • Compiling Documentation
  • Sources of Truth
  • Data Dictionary
  • The Charter
  • Understanding Business Acceptance
  • Recording Decisions
  • Choosing a Design
  • Transactional Store
  • Star/Snowflake Schemas
  • NoSQL
  • BigQuery
  • Understanding the BigQuery Model
  • Projects
  • Datasets
  • Tables
  • Normalization/Denormalization
  • Hierarchical Data Structure
  • Partitioning
  • Summary
  • Chapter 3: All My Data
  • The Data Model
  • Intake Rates
  • Value of Historical Data
  • Creating the Data Model
  • Making a Dataset
  • Creating Tables
  • Source
  • Empty
  • Google Cloud Storage
  • Upload
  • Drive
  • Google Cloud Bigtable
  • Format
  • CSV
  • JSONL
  • Avro
  • Parquet/ORC
  • Destination
  • A Little Aside on Naming Things
  • Schema
  • STRING
  • BYTES
  • INTEGER
  • FLOAT
  • NUMERIC
  • BOOLEAN
  • TIMESTAMP
  • DATE
  • TIME
  • GEOGRAPHY
  • ARRAY
  • STRUCT (RECORD)
  • Mode
  • Partition and Cluster Settings
  • Advanced Options
  • Partitioning
  • Partitioning by Integer
  • Clustering
  • Reading from BigQuery
  • BigQuery UI
  • bq Command Line
  • BigQuery API
  • BigQuery Storage API
  • Summary
  • Chapter 4: Managing BigQuery Costs
  • The BigQuery Model
  • BigQuery Cost Models
  • Storage Pricing