Hadoop essentials : delve into the key concepts of Hadoop and get a thorough understanding of the Hadoop ecosystem /
If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. This book is also meant for Hadoop professionals who want to find solutions to the different challenges they come across in their Hadoop pr...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, UK :
Packt Publishing,
2015.
|
Colección: | Community experience distilled.
|
Temas: | |
Acceso en línea: | Texto completo Texto completo |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Author; Acknowledgments; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Introduction to Big Data and Hadoop; V's of big data; Volume; Velocity; Variety; Understanding big data; NoSQL; Types of NoSQL databases; Analytical database; Who is creating the big data?; Big data use cases; Big data use case patterns; Big data as a storage pattern; Big data as a data transformation pattern; Big data for a data analysis pattern; Big data for data in a real-time pattern; Big data for a low latency caching pattern; Hadoop; Hadoop history
- Description Advantages of Hadoop; Uses of Hadoop; Hadoop ecosystem; Apache Hadoop; Hadoop distributions; Pillars of Hadoop-HDFS, MapReduce, and YARN; Data access components
- Hive and Pig; Data storage component
- HBase; Data ingestion in Hadoop- Sqoop and Flume; Streaming and real-time analysis
- Storm and Spark; Summary; Chapter 2: Hadoop Ecosystem; Traditional systems; Database trend; Hadoop use cases; Hadoop basic data flow; Hadoop integration; The Hadoop ecosystem; Distributed filesystem; HDFS; Distributed programming; NoSQL databases; Apache HBase; Data ingestion; Service Programming
- Apache YARN Apache Zookeeper; Scheduling; Data analytics and machine learning; System management; Apache Ambari; Summary; Chapter 3: Pillars of Hadoop
- HDFS, MapReduce, and YARN; HDFS; Features of HDFS; HDFS Architecture; NameNode; DataNode; Checkpoint NameNode or Secondary NameNode; BackupNode; Data storage in HDFS; Read pipeline; Write pipeline; Rack awareness; Advantages of rack awareness in HDFS; HDFS Federation; Limitations of HDFS 1.0; The benefit of HDFS Federation; HDFS ports; HDFS commands; MapReduce; MapReduce architecture; JobTracker; TaskTracker; Serialization data types
- Writable interface Writable Comparable interface; MapReduce example; The MapReduce process; Mapper; Shuffle and sorting; Reducer; Speculative execution; FileFormats; InputFormats; RecordReader; OutputFormats; RecordWriter; Writing a MapReduce program; Mapper code; Reducer code; Driver code; Auxiliary steps; Combiner; Partitioner; YARN; YARN Architecture; ResourceManager; NodeManager; ApplicationMaster; Applications powered by YARN; Summary; Chapter 4: Data Access Components
- Hive and Pig; Need of a data processing tool on Hadoop; Pig; Pig data types; Pig architecture; The logical plan
- The physical plan The MapReduce plan; Pig modes; Grunt shell; Input data; Loading data; Dump; Store; Filter; Group By; Limit; Aggregation; Cogroup; DESCRIBE; EXPLAIN; ILLUSTRATE; Hive; Hive architecture; Metastore; Query compiler; Execution engine; Data types and schemas; Installing Hive; Starting Hive Shell; HiveQL; DDL (Data Definition Language) operations; DML (Data Manipulation Language) operations; SQL operation; Built-in functions; Custom UDF (User Defined Functions); Managing tables (external versus managed); SerDe; Partitioning; Bucketing; Summary; Chapter 5: Storage Component
- HBase