Hadoop real-world solutions cookbook /
Over 90 hands-on recipes to help you learn and master the intricacies of Apache Hadoop 2.X, YARN, Hive, Pig, Oozie, Flume, Sqoop, Apache Spark, and MahoutAbout This Book Implement outstanding Machine Learning use cases on your own analytics models and processes. Solutions to common problems when wor...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Packt Publishing
2016.
|
Edición: | Second edition. |
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Author; Acknowledgements; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Getting Started with Hadoop 2.X; Introduction; Installing a Single Node Hadoop Cluster; Installing a multi-node Hadoop cluster; Adding new nodes to existing Hadoop clusters; Executing balancer command for uniform data distribution; Entering and exiting from the safe mode in a Hadoop cluster; Decommissioning DataNodes; Performing benchmarking on a Hadoop cluster; Chapter 2: Exploring HDFS; Introduction; Loading data from a local machine to HDFS.
- Exporting data from HDFS to local machineChanging the replication factor of an existing file in HDFS; Setting the HDFS block size for all the files in a cluster; Setting the HDFS block size for a specific file in a cluster; Enabling transparent encryption for HDFS; Importing data from another Hadoop cluster; Recycling deleted data from trash to HDFS; Saving compressed data in HDFS; Chapter 3: Mastering Map Reduce Programs; Introduction; Writing the Map Reduce program in Java to analyze web log data; Executing the Map Reduce program in a Hadoop cluster.
- Adding support for a new writable data type in HadoopImplementing a user-defined counter in a Map Reduce program; Map Reduce program to find the top X; Map Reduce program to find distinct values; Map Reduce program to partition data using a custom partitioner; Writing Map Reduce results to multiple output files; Performing Reduce side Joins using Map Reduce; Unit testing the Map Reduce code using MRUnit; Chapter 4: Data Analysis Using Hive, Pig, and Hbase; Introduction; Storing and processing Hive data in a sequential file format; Storing and processing Hive data in the ORC file format.
- Storing and processing Hive data in the ORC file formatStoring and processing Hive data in the Parquet file format; Performing FILTER By queries in Pig; Performing Group By queries in Pig; Performing Order By queries in Pig; Performing JOINS in Pig; Writing a user-defined function in Pig; Analyzing web log data using Pig; Performing the Hbase operation in CLI; Performing Hbase operations in Java; Executing the MapReduce programming with an Hbase Table; Chapter 5: Advanced Data Analysis Using Hive; Introduction; Processing JSON data using Hive JSON SerDe.
- Processing XML data using Hive XML SerDeProcessing Hive data in AVRO format; Writing User Defined functions in Hive; Performing table joins in Hive; Executing map side joins in Hive; Performing context Ngram in Hive; Call Data Record Analytics using Hive; Twitter sentiment analysis using Hive; Implementing Change Data Capture using Hive; Multiple table inserting using Hive; Chapter 6: Data Import/Export Using Sqoop and Flume; Introduction; Importing data from RDMBS to HDFS using Sqoop; Exporting data from HDFS to RDBMS; Using query operator in Sqoop import.