Cargando…

Hadoop MapReduce cookbook : recipes for analyzing large and complex datasets with Hadoop MapReduce /

Individual self-contained code recipes. Solve specific problems using individual recipes, or work through the book to develop your capabilities. If you are a big data enthusiast and striving to use Hadoop to solve your problems, this book is for you. Aimed at Java programmers with some knowledge of...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Perera, Srinath
Otros Autores: Gunarathne, Thilina
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham : Packt Pub., 2013.
Colección:Community experience distilled.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover; Copyright; Credits; About the Authors; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Getting Hadoop Up and Running in a Cluster; Introduction; Setting up Hadoop in your machine; Writing a WordCount MapReduce sample, bundling it, and running it using standalone; Hadoop; Adding the combiner step to the WordCount MapReduce program; Setting up HDFS; Using HDFS monitoring UI; HDFS basic command-line file operations; Setting Hadoop in a distributed cluster environment; Running WordCount program in a distributed cluster environment
  • Using MapReduce monitoring UIChapter 2: Advanced HDFS; Introduction; Benchmarking HDFS; Adding a new DataNode; Decommissioning DataNodes; Using multiple disks/volumes and limiting HDFS disk usage; Setting HDFS block size; Setting the file replication factor; Using HDFS Java API; Using HDFS C API (libhdfs); Mounting HDFS (Fuse-DFS); Merging files in HDFS; Chapter 3: Advanced Hadoop MapReduce Administration; Introduction; Tuning Hadoop configurations for cluster deployments; Running benchmarks to verify the Hadoop installation; Reusing Java VMs to improve the performance
  • Fault tolerance and speculative executionDebug scripts
  • analyzing task failures; Setting failure percentages and skipping bad records; Shared-user Hadoop clusters
  • using fair and other schedulers; Hadoop security
  • integrating with Kerberos; Using the Hadoop Tool interface; Chapter 4: Developing Complex Hadoop MapReduce Applications; Introduction; Choosing appropriate Hadoop data types; Implementing a custom Hadoop Writable data type; Implementing a custom Hadoop key type; Emitting data of different value types from a mapper; Choosing a suitable Hadoop InputFormat for your input data format
  • Adding support for new input data formats
  • implementing a custom InputFormatFormatting the results of MapReduce computations
  • using Hadoop; OutputFormats; Hadoop intermediate (map to reduce) data partitioning; Broadcasting and distributing shared resources to tasks in a MapReduce; job
  • Hadoop DistributedCache; Using Hadoop with legacy applications
  • Hadoop Streaming; Adding dependencies between MapReduce jobs; Hadoop counters for reporting custom metrics; Chapter 5: Hadoop Ecosystem; Introduction; Installing HBase; Data random access using Java client APIs
  • Running MapReduce jobs on HBase (table input/output)Installing Pig; Running your first Pig command; Set operations (join, union) and sorting with Pig; Installing Hive; Running SQL-style query with Hive; Performing a join with Hive; Installing Mahout; Running K-means with Mahout; Visualizing K-means results; Chapter 6: Analytics; Introduction; Simple analytics using MapReduce; Performing Group-By using MapReduce; Calculating frequency distributions and sorting using MapReduce; Plotting the Hadoop results using GNU Plot; Calculating histograms using MapReduce