Cargando…

Data Lake development with Big Data : explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies /

Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies About This Book Comprehend the intricacies of architecting a Data Lake and build a data strategy around your current data architecture Efficiently manag...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autores principales:	Pasupuleti, Pradeep (Autor), Purra, Beulah Salome (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Birmingham : Packt Publishing, 2015.
Colección:	Community experience distilled.
Temas:	Electronic data processing > Distributed processing > Management. Big data. Information storage and retrieval systems. Données volumineuses. Systèmes d'information. information storage. information retrieval services. COMPUTERS > Data Processing.
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

Tabla de Contenidos:

Cover; Copyright; Credits; About the Authors; Acknowledgement; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: The Need for Data Lake; Before the Data Lake; Need for a Data Lake; Defining Data Lake; Key benefits of Data Lake; Challenges in implementing a Data Lake; When to go for a Data Lake implementation; Data Lake architecture; Architectural considerations; Architectural composition; Architectural details; Understanding Data Lake layers; Understanding Data Lake tiers; Summary; Chapter 2: Data Intake; Understanding Intake tier zones
Source System Zone functionalitiesUnderstanding connectivity processing; Understanding Intake Processing for data variety; Transient Landing Zone functionalities; File validation checks; Data Integrity checks; Raw Storage Zone functionalities; Data lineage processes; Deep Integrity checks; Security and governance; Information Lifecycle Management; Practical Data Ingestion scenarios; Architectural guidance; Structured data use cases; Semi-structured and Unstructured data use cases; Big Data tools and technologies; Ingestion of structured data; Ingestion of streaming data; Summary
Chapter 3: Data Integration, Quality, and EnrichmentIntroduction to the Data Management Tier; Understanding Data Integration; Introduction to Data Integration; Prominent features of Data Integration; Practical Data Integration scenarios; The workings of Data Integration; Raw data discovery; Data quality assessment; Data cleansing; Data transformations; Data enrichment; Collect Metadata and track data lineage; Traditional data integration versus Data Lake; Data pipelines; Data partitioning; Scale on demand; Data ingest parallelism; Extensibility; Big Data tools and technologies; Syncsort
Use case scenarios for SyncsortTalend; Use case scenarios for Talend; Pentaho; Use case scenarios for Pentaho; Summary; Chapter 4: Data Discovery and Consumption; Understanding the Data Consumption tier; Data Consumption
Traditional versus Data Lake; An introduction to Data Consumption; Practical Data Consumption scenarios; Data Discovery and metadata; Enabling Data Discovery; Data classification; Relation extraction; Indexing data; Performing Data Discovery; Semantic search; Faceted search; Fuzzy search; Data Provisioning and metadata; Data publication; Data subscription
Data Provisioning functionalitiesData formatting; Data selection; Data Provisioning approaches; Post-provisioning processes; Architectural guidance; Data discovery; Big Data tools and technologies; Data Provisioning; Big Data tools and technologies; Summary; Chapter 5: Data Governance; Understanding Data Governance; Introduction to Data Governance; The need for Data Governance; Governing Big Data in the Data Lake; Data Governance
traditional versus Data Lake; Practical Data Governance scenarios; Data Governance components; Metadata management and lineage tracking; Data security and privacy

Data Lake development with Big Data : explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies /

Ejemplares similares