Cargando…

Data lakes /

The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is s...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Otros Autores: Laurent, Anne, 1976-, Laurent, Dominique, Madera, Cédrine
Formato: Electrónico eBook
Idioma:Inglés
Publicado: London : Hoboken : ISTE, Ltd. ; Wiley, 2020.
Colección:Computer engineering series. Databases and big data set ; volume 2.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Cover
  • Half-Title Page
  • Dedication
  • Title Page
  • Copyright Page
  • Contents
  • Preface
  • 1. Introduction to Data Lakes: Definitions and Discussions
  • 1.1. Introduction to data lakes
  • 1.2. Literature review and discussion
  • 1.3. The data lake challenges
  • 1.4. Data lakes versus decision-making systems
  • 1.5. Urbanization for data lakes
  • 1.6. Data lake functionalities
  • 1.7. Summary and concluding remarks
  • 2. Architecture of Data Lakes
  • 2.1. Introduction
  • 2.2. State of the art and practice
  • 2.2.1. Definition
  • 2.2.2. Architecture
  • 2.2.3. Metadata
  • 2.2.4. Data quality
  • 2.2.5. Schema-on-read
  • 2.3. System architecture
  • 2.3.1. Ingestion layer
  • 2.3.2. Storage layer
  • 2.3.3. Transformation layer
  • 2.3.4. Interaction layer
  • 2.4. Use case: the Constance system
  • 2.4.1. System overview
  • 2.4.2. Ingestion layer
  • 2.4.3. Maintenance layer
  • 2.4.4. Query layer
  • 2.4.5. Data quality control
  • 2.4.6. Extensibility and flexibility
  • 2.5. Concluding remarks
  • 3. Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures
  • 3.1. Our expectations
  • 3.2. Modeling data lake functionalities
  • 3.3. Building the knowledge base of industrial data lakes
  • 3.4. Our formalization approach
  • 3.5. Applying our approach
  • 3.6. Analysis of our first results
  • 3.7. Concluding remarks
  • 4. Metadata in Data Lake Ecosystems
  • 4.1. Definitions and concepts
  • 4.2. Classification of metadata by NISO
  • 4.2.1. Metadata schema
  • 4.2.2. Knowledge base and catalog
  • 4.3. Other categories of metadata
  • 4.3.1. Business metadata
  • 4.3.2. Navigational integration
  • 4.3.3. Operational metadata
  • 4.4. Sources of metadata
  • 4.5. Metadata classification
  • 4.6. Why metadata are needed
  • 4.6.1. Selection of information (re)sources
  • 4.6.2. Organization of information resources
  • 4.6.3. Interoperability and integration
  • 4.6.4. Unique digital identification
  • 4.6.5. Data archiving and preservation
  • 4.7. Business value of metadata
  • 4.8. Metadata architecture
  • 4.8.1. Architecture scenario 1: point-to-point metadata architecture
  • 4.8.2. Architecture scenario 2: hub and spoke metadata architecture
  • 4.8.3. Architecture scenario 3: tool of record metadata architecture
  • 4.8.4. Architecture scenario 4: hybrid metadata architecture
  • 4.8.5. Architecture scenario 5: federated metadata architecture
  • 4.9. Metadata management
  • 4.10. Metadata and data lakes
  • 4.10.1. Application and workload layer
  • 4.10.2. Data layer
  • 4.10.3. System layer
  • 4.10.4. Metadata types
  • 4.11. Metadata management in data lakes
  • 4.11.1. Metadata directory
  • 4.11.2. Metadata storage
  • 4.11.3. Metadata discovery
  • 4.11.4. Metadata lineage
  • 4.11.5. Metadata querying
  • 4.11.6. Data source selection
  • 4.12. Metadata and master data management
  • 4.13. Conclusion
  • 5. A Use Case of Data Lake Metadata Management
  • 5.1. Context