Pro Apache Hadoop /
Pro Apache Hadoop, Second Edition brings you up to speed on Hadoop? the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest developments such as YARN (aka MapReduce 2.0), new HDFS high-availability features, and increased scalability in the form of HDFS Federations. A...
Clasificación: | Libro Electrónico |
---|---|
Autores principales: | , |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Berkeley, CA :
Apress,
2014.
|
Edición: | Second edition. |
Colección: | Expert's voice in big data.
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- At a Glance; Introduction; Chapter 1: Motivation for Big Data; What Is Big Data?; Key Idea Behind Big Data Techniques; Data Is Distributed Across Several Nodes; Applications Are Moved to the Data; Data Is Processed Local to a Node; Sequential Reads Preferred Over Random Reads; An Example; Big Data Programming Models; Massively Parallel Processing (MPP) Database Systems; In-Memory Database Systems; MapReduce Systems; Bulk Synchronous Parallel (BSP) Systems; Big Data and Transactional Systems; How Much Can We Scale?; A Compute-Intensive Example; Amdhal's Law.
- Business Use-Cases for Big DataSummary; Chapter 2: Hadoop Concepts; Introducing Hadoop; Introducing the MapReduce Model; Components of Hadoop; Hadoop Distributed File System (HDFS); Block Storage Nature of Hadoop Files; File Metadata and NameNode; Mechanics of an HDFS Write; Mechanics of an HDFS Read; Mechanics of an HDFS Delete; Ensuring HDFS Reliability; Secondary NameNode; TaskTracker; JobTracker; Hadoop 2.0; Components of YARN; Container; Node Manager; Resource Manager; Application Master; Anatomy of a YARN Request; HDFS High Availability; Summary.
- Chapter 3: Getting Started with the Hadoop FrameworkTypes of Installation; Stand-Alone Mode; Pseudo-Distributed Cluster; Multinode Node Cluster Installation; Preinstalled Using Amazon Elastic MapReduce; Setting up a Development Environment with a Cloudera Virtual Machine; Components of a MapReduce program; Your First Hadoop Program; Prerequisites to Run Programs in Local Mode; WordCount Using the Old API; Building the Application; Running WordCount in Cluster Mode; WordCount Using the New API; Building the Application; Running WordCount in Cluster Mode; Third-Party Libraries in Hadoop Jobs.
- Allocation File Format and ConfigurationsDetermine Dominant Resource Share in drf Policy; Slaves File; Rack Awareness; Providing Hadoop with Network Topology; Cluster Administration Utilities; Check the HDFS; Command-Line HDFS Administration; HDFS Cluster Health Report; Add/Remove Nodes; Placing the HDFS in Safemode; Rebalancing HDFS Data; Copying Large Amounts of Data from the HDFS; Summary; Chapter 5: Basics of MapReduce Development; Hadoop and Data Process ing; Reviewing the Airline Dataset; Preparing the Development Environment; Preparing the Hadoop System; MapReduce Programming Patterns.