Big Data Architect's Handbook : a Guide to Building Proficiency in Tools and Systems Used by Leading Big Data Experts.
The primary responsibility of any Big Data architect is to design an end-to-end Big Data solution that integrates data from different sources and analyzes it to find hidden business insights. This book will show you how to do just that, by leveraging the popular tools within the Hadoop ecosystem to...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing Ltd,
2018.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Title Page; Copyright and Credits; Packt Upsell; Contributors; Table of Contents; Preface; Chapter 1: Why Big Data?; What is big data?; Characteristics of big data; Volume; Velocity; Variety; Veracity; Variability; Value; Solution-based approach for data; Data
- the most valuable asset; Traditional approaches to data storage; Clustered computing; High availability; Resource pooling; Easy scalability; Big data
- how does it make a difference?; Big data solutions
- cloud versus on-premises infrastructure; Cost; Security; Current capabilities; Scalability; Big data glossary; Big data.
- Batch processingCluster computing; Data warehouse; Data lake; Data mining; ETL; Hadoop; In-memory computing; Machine learning; MapReduce; NoSQL; Stream processing; Summary; Chapter 2: Big Data Environment Setup; Oracle VM VirtualBox installation; Ubuntu installation; Hadoop prerequisite installation; Java installation; SSH installation and configuration; Hadoop system user; Apache Hadoop installation; Hadoop configuration; Path configuration for Hadoop commands; Hadoop server start and stop; Summary; Chapter 3: Hadoop Ecosystem; Apache Hadoop; Hadoop Distributed File System; HDFS hands-on.
- Creating a directory in HDFSCopying files from a local file system to HDFS; Copying files from HDFS to a local file system; Deleting files and folders in HDFS; Hadoop MapReduce; Job Tracker and Task Tracker; The execution flow of MapReduce ; Mapper; Shuffle and Sort; Reducer; Example program; Preparing the data file for analysis; Program code; Driver program; Mapper program; Reducer program; Observations and results; YARN; Resource Manager; Node Manager; Container; Application Master; Apache Projects related to big data; Apache Zookeeper; Apache Kafka; Apache Flume; Apache Cassandra.
- Apache HBaseApache Spark; Summary; Chapter 4: NoSQL Database; What is NoSQL?; Benefits of NoSQL databases; NoSQL versus RDBMS; The CAP theorem; The ACID properties; Data models in NoSQL; Key-value data stores; Document store; Column stores; Graph stores; Apache Cassandra; Installation; Starting Cassandra; The Cassandra Query Language
- CQL; The help command; Basic commands; Data manipulation; Creating, altering, and deleting a keyspace; Creating, altering, and deleting tables; Inserting, updating, and deleting data; The MongoDB database; Installing MongoDB; Starting MongoDB.
- Working on MongoDBThe help command; Basic commands; Data manipulation; Creating and deleting databases; Creating and deleting collections; The create, retrieve, update, delete operations; Neo4j database; Installing Neo4j; Starting Neo4j; The cypher query language; Help; Basic operations in Cypher; Creating nodes, relationships, and properties; Updating nodes, relationships, and properties; Deleting nodes, relationships, and properties; Reading nodes, relationships, and properties; Summary; Chapter 5: Off-the-Shelf Commercial Tools; Microsoft Azure; Building a practical application.