Apache Hadoop 3 Quick Start Guide : Learn about Big Data Processing and Analytics.
Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics such as MapReduc...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing Ltd,
2018.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Title Page; Copyright and Credits; Dedication; Packt Upsell; Contributors; Table of Contents; Preface; Chapter 1: Hadoop 3.0
- Background and Introduction; How it all started ; What Hadoop is and why it is important; How Apache Hadoop works ; Resource Manager; Node Manager; YARN Timeline Service version 2; NameNode; DataNode; Hadoop 3.0 releases and new features; Choosing the right Hadoop distribution; Cloudera Hadoop distribution; Hortonworks Hadoop distribution; MapR Hadoop distribution; Summary; Chapter 2: Planning and Setting Up Hadoop Clusters; Technical requirements.
- Prerequisites for Hadoop setupPreparing hardware for Hadoop; Readying your system; Installing the prerequisites; Working across nodes without passwords (SSH in keyless); Downloading Hadoop; Running Hadoop in standalone mode; Setting up a pseudo Hadoop cluster; Planning and sizing clusters; Initial load of data; Organizational data growth; Workload and computational requirements; High availability and fault tolerance; Velocity of data and other factors; Setting up Hadoop in cluster mode; Installing and configuring HDFS in cluster mode; Setting up YARN in cluster mode.
- Diagnosing the Hadoop clusterWorking with log files; Cluster debugging and tuning tools; JPS (Java Virtual Machine Process Status); JStack; Summary; Chapter 3: Deep Dive into the Hadoop Distributed File System; Technical requirements; How HDFS works; Key features of HDFS; Achieving multi tenancy in HDFS; Snapshots of HDFS; Safe mode; Hot swapping; Federation; Intra-DataNode balancer; Data flow patterns of HDFS; HDFS as primary storage with cache; HDFS as archival storage; HDFS as historical storage; HDFS as a backbone; HDFS configuration files; Hadoop filesystem CLIs.
- Working with HDFS user commandsWorking with Hadoop shell commands; Working with data structures in HDFS; Understanding SequenceFile; MapFile and its variants; Summary; Chapter 4: Developing MapReduce Applications; Technical requirements; How MapReduce works; What is MapReduce?; An example of MapReduce; Configuring a MapReduce environment; Working with mapred-site.xml; Working with Job history server; RESTful APIs for Job history server; Understanding Hadoop APIs and packages; Setting up a MapReduce project; Setting up an Eclipse project; Deep diving into MapReduce APIs.
- Configuring MapReduce jobsUnderstanding input formats; Understanding output formats; Working with Mapper APIs; Working with the Reducer API; Compiling and running MapReduce jobs; Triggering the job remotely; Using Tool and ToolRunner; Unit testing of MapReduce jobs; Failure handling in MapReduce; Streaming in MapReduce programming; Summary; Chapter 5: Building Rich YARN Applications; Technical requirements; Understanding YARN architecture; Key features of YARN; Resource models in YARN; YARN federation; RESTful APIs; Configuring the YARN environment in a cluster.