Hadoop MapReduce cookbook : recipes for analyzing large and complex datasets with Hadoop MapReduce /
Individual self-contained code recipes. Solve specific problems using individual recipes, or work through the book to develop your capabilities. If you are a big data enthusiast and striving to use Hadoop to solve your problems, this book is for you. Aimed at Java programmers with some knowledge of...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Otros Autores: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Pub.,
2013.
|
Colección: | Community experience distilled.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Authors; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Getting Hadoop Up and Running in a Cluster; Introduction; Setting up Hadoop in your machine; Writing a WordCount MapReduce sample, bundling it, and running it using standalone; Hadoop; Adding the combiner step to the WordCount MapReduce program; Setting up HDFS; Using HDFS monitoring UI; HDFS basic command-line file operations; Setting Hadoop in a distributed cluster environment; Running WordCount program in a distributed cluster environment
- Using MapReduce monitoring UIChapter 2: Advanced HDFS; Introduction; Benchmarking HDFS; Adding a new DataNode; Decommissioning DataNodes; Using multiple disks/volumes and limiting HDFS disk usage; Setting HDFS block size; Setting the file replication factor; Using HDFS Java API; Using HDFS C API (libhdfs); Mounting HDFS (Fuse-DFS); Merging files in HDFS; Chapter 3: Advanced Hadoop MapReduce Administration; Introduction; Tuning Hadoop configurations for cluster deployments; Running benchmarks to verify the Hadoop installation; Reusing Java VMs to improve the performance
- Fault tolerance and speculative executionDebug scripts
- analyzing task failures; Setting failure percentages and skipping bad records; Shared-user Hadoop clusters
- using fair and other schedulers; Hadoop security
- integrating with Kerberos; Using the Hadoop Tool interface; Chapter 4: Developing Complex Hadoop MapReduce Applications; Introduction; Choosing appropriate Hadoop data types; Implementing a custom Hadoop Writable data type; Implementing a custom Hadoop key type; Emitting data of different value types from a mapper; Choosing a suitable Hadoop InputFormat for your input data format
- Adding support for new input data formats
- implementing a custom InputFormatFormatting the results of MapReduce computations
- using Hadoop; OutputFormats; Hadoop intermediate (map to reduce) data partitioning; Broadcasting and distributing shared resources to tasks in a MapReduce; job
- Hadoop DistributedCache; Using Hadoop with legacy applications
- Hadoop Streaming; Adding dependencies between MapReduce jobs; Hadoop counters for reporting custom metrics; Chapter 5: Hadoop Ecosystem; Introduction; Installing HBase; Data random access using Java client APIs
- Running MapReduce jobs on HBase (table input/output)Installing Pig; Running your first Pig command; Set operations (join, union) and sorting with Pig; Installing Hive; Running SQL-style query with Hive; Performing a join with Hive; Installing Mahout; Running K-means with Mahout; Visualizing K-means results; Chapter 6: Analytics; Introduction; Simple analytics using MapReduce; Performing Group-By using MapReduce; Calculating frequency distributions and sorting using MapReduce; Plotting the Hadoop results using GNU Plot; Calculating histograms using MapReduce