Cargando…

Learning Hadoop 2 : design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2 /

If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Fam...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autores principales:	Turkington, Garry (Autor), Modena, Gabriele (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Birmingham, UK : Packt Publishing, 2015.
Colección:	Community experience distilled.
Temas:	Apache Hadoop. Apache Hadoop Electronic data processing > Distributed processing. Big data. Traitement réparti. Données volumineuses. COMPUTERS > Computer Literacy. COMPUTERS > Computer Science. COMPUTERS > Data Processing. COMPUTERS > Hardware > General. COMPUTERS > Information Technology. COMPUTERS > Machine Theory. COMPUTERS > Reference. Big data Electronic data processing > Distributed processing
Acceso en línea:	Texto completo Texto completo

Tabla de Contenidos:

Cover; Copyright; Credits; About the Authors; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Introduction; A note on versioning; The background of Hadoop; Components of Hadoop; Common building blocks; Storage; Computation; Better together; Hadoop 2
what's the big deal?; Storage in Hadoop 2; Computation in Hadoop 2; Distributions of Apache Hadoop; A dual approach; AWS
infrastructure on demand from Amazon; Simple Storage Service (S3); Elastic MapReduce (EMR); Getting started; Cloudera QuickStart VM; Amazon EMR; Creating an AWS account
Signing up for the necessary servicesUsing Elastic MapReduce; Getting Hadoop up and running; How to use EMR; AWS credentials; The AWS command-line interface; Running the examples; Data processing with Hadoop; Why Twitter?; Building our first dataset; One service, multiple APIs; Anatomy of a Tweet; Twitter credentials; Programmatic access with Python; Summary; Chapter 2: Storage; The inner workings of HDFS; Cluster startup; NameNode startup; DataNode startup; Block replication; Command-line access to the HDFS filesystem; Exploring the HDFS filesystem; Protecting the filesystem metadata
Secondary NameNode not to the rescueHadoop 2 NameNode HA; Keeping the HA NameNodes in sync; Client configuration; How a failover works; Apache ZooKeeper
a different type of filesystem; Implementing a distributed lock with sequential ZNodes; Implementing group membership and leader election using ephemeral ZNodes; Java API; Building blocks; Further reading; Automatic NameNode failover; HDFS snapshots; Hadoop filesystems; Hadoop interfaces; Java FileSystem API; Libhdfs; Thrift ; Managing and serializing data; The Writable interface; Introducing the wrapper classes ; Array wrapper classes
The Comparable and WritableComparable interfacesStoring data; Serialization and Containers; Compression; General-purpose file formats; Column-oriented data formats; RCFile; ORC; Parquet; Avro; Using the Java API; Summary; Chapter 3: Processing
MapReduce and Beyond; MapReduce; Java API to MapReduce; The Mapper class; The Reducer class; The Driver class; Combiner; Partitioning; The optional partition function; Hadoop-provided mapper and reducer implementations; Sharing reference data; Writing MapReduce programs; Getting started; Running the examples; Local cluster; Elastic MapReduce
WordCount, the Hello World of MapReduceWord co-occurrences; Trending topics; The Top N pattern; Sentiment of hashtags; Text cleanup using chain mapper; Walking through a run of a MapReduce job; Startup; Splitting the input; Task assignment; Task startup; Ongoing JobTracker monitoring; Mapper input; Mapper execution; Mapper output and reducer input; Reducer input; Reducer execution; Reducer output; Shutdown; Input/Output; InputFormat and RecordReader; Hadoop-provided InputFormat; Hadoop-provided RecordReader; OutputFormat and RecordWriter; Hadoop-provided OutputFormat; Sequence files; YARN

Learning Hadoop 2 : design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2 /

Ejemplares similares