Cargando…

Optimizing Hadoop for MapReduce.

In Detail MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Tannir, Khaled
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Packt Publishing, 2014.
Temas:	Apache Hadoop. MapReduce (Computer file) Apache Hadoop Electronic data processing > Distributed processing. Cluster analysis > Data processing. Open source software. Traitement réparti. Classification automatique (Statistique) > Informatique. Logiciels libres. Cluster analysis > Data processing Electronic data processing > Distributed processing Open source software
Acceso en línea:	Texto completo

Tabla de Contenidos:

Cover; Copyright; Credits; About the Author; Acknowledgments; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Understanding Hadoop MapReduce; The MapReduce model; Overview of Hadoop MapReduce; Hadoop MapReduce internals; Factors affecting the performance of MapReduce; Summary; Chapter 2: An Overview of the Hadoop Parameters; Investigating the Hadoop parameters; The mapred-site.xml configuration file; The CPU-related parameters; The disk I/O related parameters; The memory-related parameters; The network-related parameters; The hdfs-site.xml configuration file.
The core-site.xml configuration fileHadoop MapReduce metrics; Performance monitoring tools; Using Chukwa to monitor Hadoop; Using Ganglia to monitor Hadoop; Using Nagios to monitor Hadoop; Using Apache Ambari to monitor Hadoop; Summary; Chapter 3: Detecting System Bottlenecks; Performance tuning; Creating a performance baseline; Identifying resource bottlenecks; Identifying RAM bottlenecks; Identifying CPU bottlenecks; Identifying storage bottlenecks; Identifying network bandwidth bottlenecks; Summary; Chapter 4: Identifying Resource Weaknesses; Identifying cluster weakness.
Checking the Hadoop cluster node's healthChecking the input data size; Checking massive I/O and network traffic; Checking for insufficient concurrent tasks; Checking for CPU contention; Sizing your Hadoop cluster; Configuring your cluster correctly; Summary; Chapter 5: Enhancement of Map and Reduce Tasks; Enhancing Map tasks; Input data and block size impact; Dealing with small and unsplittable files; Reducing spilled records during the Map phase; Calculating map tasks' throughput; Enhancing Reduce tasks; Calculating reduce task throughput; Improving Reduce execution phase.
Tuning map and reduce parametersSummary; Chapter 6: Optimizing MapReduce Tasks; Using Combiners; Using compression; Using appropriate Writable types; Reusing types smartly; Optimizing mappers and reducers code; Summary; Chapter 7: Best Practices and Recommendations; Hardware tuning and OS recommendations; Hadoop cluster checklists; The Bios tuning checklist; OS configuration recommendations; Hadoop best practices and recommendations; Deploying Hadoop; Hadoop tuning recommendations; Using a MapReduce template class code; Summary; Index.

Optimizing Hadoop for MapReduce.

Ejemplares similares