Big data made easy : a working guide to the complete Hadoop toolset /
Many corporations are finding that the size of their data sets are outgrowing the capability of their systems to store and process them. The data is becoming too big to manage and use with traditional tools. The solution: implementing a big data system. As Big Data Made Easy: A Working Guide to the...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
[Berkeley, CA] :
Apress,
[2015]
|
Colección: | Expert's voice in big data.
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- At a Glance; Introduction; Chapter 1: The Problem with Data; A Definition of "Big Data"; The Potentials and Difficulties of Big Data; Requirements for a Big Data System; How Hadoop Tools Can Help; My Approach; Overview of the Big Data System; Big Data Flow and Storage; Benefits of Big Data Systems; What's in This Book; Storage: Chapter 2; Data Collection: Chapter 3; Processing: Chapter 4; Scheduling: Chapter 5; Data Movement: Chapter 6; Monitoring: Chapter 7; Cluster Management: Chapter 8; Analysis: Chapter 9; ETL: Chapter 10; Reports: Chapter 11; Summary.
- Chapter 2: Storing and Configuring Data with Hadoop, YARN, and ZooKeeperAn Overview of Hadoop; The Hadoop V1 Architecture; The Differences in Hadoop V2; The Hadoop Stack; Environment Management; Hadoop V1 Installation; Hadoop 1.2.1 Single-Node Installation; 1. Set up Bash shell file for hadoop HOME/.bashrc; 2. Set up conf/hadoop-env. sh; 3. Create Hadoop temporary directory; 4. Set up conf/core-site. xml; 5. Set up conf/mapred-site. xml; 6. Set up file conf/hdfs-site. xml; 7. Format the file system; Setting up the Cluster; Running a Map Reduce Job Check; Hadoop User Interfaces.
- Hadoop V2 InstallationZooKeeper Installation; Manually Accessing the ZooKeeper Servers; The ZooKeeper Client; Hadoop MRv2 and YARN; Running Another Map Reduce Job Test; Hadoop Commands; Hadoop Shell Commands; Hadoop User Commands; Hadoop Administration Commands; Summary; Chapter 3: Collecting Data with Nutch and Solr; The Environment; Stopping the Servers; Changing the Environment Scripts; Starting the Servers; Architecture 1: Nutch 1.x; Nutch Installation; Solr Installation; Running Nutch with Hadoop 1.8; Architecture 2: Nutch 2.x; Nutch and Solr Configuration; HBase Installation.
- Gora ConfigurationRunning the Nutch Crawl; Potential Errors; A Brief Comparison; Summary; Chapter 4: Processing Data with Map Reduce; An Overview of the Word-Count Algorithm; Map Reduce Native; Java Word-Count Example 1; Describing the Example 1 Code; Running the Example 1 Code; Java Word-Count Example 2; Describing the Example 2 Code; Running the Example 2 Code; Comparing the Examples; Map Reduce with Pig; Installing Pig; Running Pig; Pig User-Defined Functions; Map Reduce with Hive; InstallingHive; Hive Word-Count Example; Map Reduce with Perl; Summary; Chapter 5: Scheduling and Workflow.
- An Overview of SchedulingThe Capacity Scheduler; The Fair Scheduler; Scheduling in Hadoop V1; V1 Capacity Scheduler; V1 Fair Scheduler; Scheduling in Hadoop V2; V2 Capacity Scheduler; V2 Fair Scheduler; Using Oozie for Workflow; Installing Oozie; The Mechanics of the Oozie Workflow; Oozie Workflow Control Nodes; Oozie Workflow Actions; Creating an Oozie Workflow; The Workflow Configuration File; Running an Oozie Workflow; Scheduling an Oozie Workflow; Summary; Chapter 6: Moving Data; Moving File System Data; The Cat Command; The CopyFromLocal Command; The CopyToLocal Command; The Cp Command.