Cargando…

Scalable Data Architecture with Java : Build Efficient Enterprise-Grade Data Architecting Solutions Using Java /

Orchestrate data architecting solutions using Java and related technologies to evaluate, recommend and present the most suitable solution to leadership and clients Key Features Learn how to adapt to the ever-evolving data architecture technology landscape Understand how to choose the best suited tec...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Banerjee, Sinchan
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, Limited, 2022.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright and Credits
  • Contributors
  • About the reviewers
  • Table of Contents
  • Preface
  • Section 1
  • Foundation of Data Systems
  • Chapter 1: Basics of Modern Data Architecture
  • Exploring the landscape of data engineering
  • What is data engineering?
  • Dimensions of data
  • Types of data engineering problems
  • Responsibilities and challenges of a Java data architect
  • Data architect versus data engineer
  • Challenges of a data architect
  • Techniques to mitigate those challenges
  • Summary
  • Chapter 2: Data Storage and Databases
  • Understanding data types, formats, and encodings
  • Data types
  • Data formats
  • Understanding file, block, and object storage
  • File storage
  • Block storage
  • Object storage
  • The data lake, data warehouse, and data mart
  • Data lake
  • Data warehouse
  • Data marts
  • Databases and their types
  • Relational database
  • NoSQL database
  • Data model design considerations
  • Summary
  • Chapter 3: Identifying the Right Data Platform
  • Technical requirements
  • Virtualization and containerization platforms
  • Benefits of virtualization
  • Containerization
  • Benefits of containerization
  • Kubernetes
  • Hadoop platforms
  • Hadoop architecture
  • Cloud platforms
  • Benefits of cloud computing
  • Choosing the correct platform
  • When to choose virtualization versus containerization
  • When to use big data
  • Choosing between on-premise versus cloud-based solutions
  • Choosing between various cloud vendors
  • Summary
  • Section 2
  • Building Data Processing Pipelines
  • Chapter 4: ETL Data Load
  • A Batch-Based Solution to Ingesting Data in a Data Warehouse
  • Technical requirements
  • Understanding the problem and source data
  • Problem statement
  • Understanding the source data
  • Building an effective data model
  • Relational data warehouse schemas
  • Evaluation of the schema design
  • Designing the solution
  • Implementing and unit testing the solution
  • Summary
  • Chapter 5: Architecting a Batch Processing Pipeline
  • Technical requirements
  • Developing the architecture and choosing the right tools
  • Problem statement
  • Analyzing the problem
  • Architecting the solution
  • Factors that affect your choice of storage
  • Determining storage based on cost
  • The cost factor in the processing layer
  • Implementing the solution
  • Profiling the source data
  • Writing the Spark application
  • Deploying and running the Spark application
  • Developing and testing a Lambda trigger
  • Performance tuning a Spark job
  • Querying the ODL using AWS Athena
  • Summary
  • Chapter 6: Architecting a Real-Time Processing Pipeline
  • Technical requirements
  • Understanding and analyzing the streaming problem
  • Problem statement
  • Analyzing the problem
  • Architecting the solution
  • Implementing and verifying the design
  • Setting up Apache Kafka on your local machine
  • Developing the Kafka streaming application
  • Unit testing a Kafka Streams application