Cargando…

Data warehousing in the age of big data /

"In conclusion as you come to the end of this book, the concept of a Data Warehouse and its primary goal of serving the enterprise version of truth, and being the single platform for all the source of information will continue to remain intact and valid for many years to come. As we have discus...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Krishnan, Krish
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Amsterdam : Morgan Kaufmann is an imprint of Elsevier, 2013.
Colección:Morgan Kaufmann Series on Business Intelligence.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Front Cover
  • Data Warehousing in the Age of Big Data
  • Copyright Page
  • Contents
  • Acknowledgments
  • About the Author
  • Introduction
  • Part 1: Big Data
  • Part 2: The Data Warehousing
  • Part 3: Building the Big Data
  • Data Warehouse
  • Appendixes
  • Companion website
  • 1 BIG DATA
  • 1 Introduction to Big Data
  • Introduction
  • Big Data
  • Defining Big Data
  • Why Big Data and why now?
  • Big Data example
  • Social Media posts
  • Survey data analysis
  • Survey data
  • Weather data
  • Twitter data
  • Integration and analysis
  • Additional data types
  • Summary
  • Further reading.
  • 2 Working with Big Data
  • Introduction
  • Data explosion
  • Data volume
  • Machine data
  • Application log
  • Clickstream logs
  • External or third-party data
  • Emails
  • Contracts
  • Geographic information systems and geo-spatial data
  • Example: Funshots, Inc.
  • Data velocity
  • Amazon, Facebook, Yahoo, and Google
  • Sensor data
  • Mobile networks
  • Social media
  • Data variety
  • Summary
  • 3 Big Data Processing Architectures
  • Introduction
  • Data processing revisited
  • Data processing techniques
  • Data processing infrastructure challenges
  • Storage
  • Transportation
  • Processing.
  • Journal
  • Checkpoint
  • HDFS startup
  • Block allocation and storage in HDFS
  • HDFS client
  • Replication and recovery
  • Communication and management
  • Heartbeats
  • CheckpointNode and BackupNode
  • CheckpointNode
  • BackupNode
  • File system snapshots
  • JobTracker and TaskTracker
  • MapReduce
  • MapReduce programming model
  • MapReduce program design
  • MapReduce implementation architecture
  • MapReduce job processing and management
  • MapReduce limitations (Version 1, Hadoop MapReduce)
  • MapReduce v2 (YARN)
  • YARN scalability
  • Comparison between MapReduce v1 and v2
  • SQL/MapReduce.
  • Speed or throughput
  • Shared-everything and shared-nothing architectures
  • Shared-everything architecture
  • Shared-nothing architecture
  • OLTP versus data warehousing
  • Big Data processing
  • Infrastructure explained
  • Data processing explained
  • Telco Big Data study
  • Infrastructure
  • Data processing
  • 4 Introducing Big Data Technologies
  • Introduction
  • Distributed data processing
  • Big Data processing requirements
  • Technologies for Big Data processing
  • Google file system
  • Hadoop
  • Hadoop core components
  • HDFS
  • HDFS architecture
  • NameNode
  • DataNodes
  • Image.
  • Zookeeper
  • Zookeeper features
  • Locks and processing
  • Failure and recovery
  • Pig
  • Programming with pig latin
  • Pig data types
  • Running pig programs
  • Pig program flow
  • Common pig command
  • HBase
  • HBase architecture
  • HBase components
  • Write-ahead log
  • Hive
  • Hive architecture
  • Infrastructure
  • Execution: how does hive process queries?
  • Hive data types
  • Hive query language (HiveQL)
  • Chukwa
  • Flume
  • Oozie
  • HCatalog
  • Sqoop
  • Sqoop1
  • Sqoop2
  • Hadoop summary
  • NoSQL
  • CAP theorem
  • Key-value pair: Voldemort
  • Column family store: Cassandra
  • Data model.