Cargando…

Programming MapReduce with Scalding : a practical guide to designing, testing, and implementing complex MapReduce applications in Scala /

This book is an easy-to-understand, practical guide to designing, testing, and implementing complex MapReduce applications in Scala using the Scalding framework. It is packed with examples featuring log-processing, ad-targeting, and machine learning. This book is for developers who are willing to di...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Chalkiopoulos, Antonios
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, 2014.
Colección:Community experience distilled.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Introduction to MapReduce; The Hadoop platform; MapReduce; A MapReduce example; MapReduce abstractions; Introducing Cascading; What happens inside a pipe; Pipe assemblies; Cascading extensions; Summary; Get Ready for Scalding; Why Scala?; Scala basics; Scala build tools; Hello World in Scala; Development editors; Installing Hadoop in five minutes; Running our first Scalding job; Submit a Scalding job into Hadoop; Summary; Scalding by Example; Reading and writing files
  • Best practices for reading and writing filesTextLine parsing; Executing in the local and Hadoop modes; Understanding the core capabilities of Scalding; Map-like operations; Join operations; Pipe operations; Grouping/reducing functions; Operations on groups; Composite operations; A simple example; Typed API; Summary; Intermediate Examples; Logfile analysis; Completing the implementation; Exploring ad targeting; Calculating daily points; Calculating historic points; Generating targeted ads; Summary; Scalding Design Patterns; The external operations pattern; The Dependency Injection pattern
  • The Late Bound Dependency patternSummary; Testing and TDD; Introduction to testing; MapReduce testing challenges; Development lifecycle with testing strategy; TDD for Scalding developers; Implementing the TDD methodology; Decomposing the algorithm; Defining acceptance tests; Implementing integration tests; Implementing unit tests; Implementing the MapReduce logic; Defining and performing system tests; Black box testing; Summary; Running Scalding in Production; Executing Scalding in a Hadoop cluster; Scheduling execution; Coordinating job execution; Configuring using a property file
  • Configuring using Hadoop parametersMonitoring Scalding jobs; Using slim JAR files; Scalding execution throttling; Summary; Using External Data Stores; Interacting with external systems; SQL databases; NoSQL databases; Understanding HBase; Reading from HBase; Writing in HBase; Using advanced HBase features; Search platforms; Elastic Search; Summary; Matrix Calculations and Machine Learning; Text similarity using TF-IDF; Setting a similarity using the Jaccard index; K-Means using Mahout; Other libraries; Summary; Index