Cargando…

Optimizing Hadoop for MapReduce : learn how to configure your Hadoop cluster to run optimal MapReduce jobs /

This book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance. If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Tannir, Khaled
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, UK : Packt Pub., 2014.
Colección:Community experience distilled.
Temas:
Acceso en línea:Texto completo
Texto completo
Tabla de Contenidos:
  • Cover; Copyright; Credits; About the Author; Acknowledgments; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Understanding Hadoop MapReduce; The MapReduce model; Overview of Hadoop MapReduce; Hadoop MapReduce internals; Factors affecting the performance of MapReduce; Summary; Chapter 2: An Overview of the Hadoop Parameters; Investigating the Hadoop parameters; The mapred-site.xml configuration file; The CPU-related parameters; The disk I/O related parameters; The memory-related parameters; The network-related parameters; The hdfs-site.xml configuration file
  • The core-site.xml configuration fileHadoop MapReduce metrics; Performance monitoring tools; Using Chukwa to monitor Hadoop; Using Ganglia to monitor Hadoop; Using Nagios to monitor Hadoop; Using Apache Ambari to monitor Hadoop; Summary; Chapter 3: Detecting System Bottlenecks; Performance tuning; Creating a performance baseline; Identifying resource bottlenecks; Identifying RAM bottlenecks; Identifying CPU bottlenecks; Identifying storage bottlenecks; Identifying network bandwidth bottlenecks; Summary; Chapter 4: Identifying Resource Weaknesses; Identifying cluster weakness
  • Checking the Hadoop cluster node's healthChecking the input data size; Checking massive I/O and network traffic; Checking for insufficient concurrent tasks; Checking for CPU contention; Sizing your Hadoop cluster; Configuring your cluster correctly; Summary; Chapter 5: Enhancement of Map and Reduce Tasks; Enhancing Map tasks; Input data and block size impact; Dealing with small and unsplittable files; Reducing spilled records during the Map phase; Calculating map tasks' throughput; Enhancing Reduce tasks; Calculating reduce task throughput; Improving Reduce execution phase
  • Tuning map and reduce parametersSummary; Chapter 6: Optimizing MapReduce Tasks; Using Combiners; Using compression; Using appropriate Writable types; Reusing types smartly; Optimizing mappers and reducers code; Summary; Chapter 7: Best Practices and Recommendations; Hardware tuning and OS recommendations; Hadoop cluster checklists; The Bios tuning checklist; OS configuration recommendations; Hadoop best practices and recommendations; Deploying Hadoop; Hadoop tuning recommendations; Using a MapReduce template class code; Summary; Index