Cargando…

Hadoop backup and recovery solutions : learn the best strategies for data recovery from Hadoop backup clusters and troubleshoot problems /

If you are a Hadoop administrator and you want to get a good grounding in how to back up large amounts of data and manage Hadoop clusters, then this book is for you.

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autores principales:	Barot, Gaurav (Autor), Patel, Amij (Autor), Mehta, Chintan (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Birmingham, UK : Packt Publishing, 2015.
Colección:	Community experience distilled.
Temas:	Apache Hadoop. Electronic data processing > Distributed processing. Information retrieval. Big data. Open source software. Traitement réparti. Recherche de l'information. Données volumineuses. Logiciels libres. information retrieval. COMPUTERS > Computer Literacy. COMPUTERS > Computer Science. COMPUTERS > Data Processing. COMPUTERS > Hardware > General. COMPUTERS > Information Technology. COMPUTERS > Machine Theory. COMPUTERS > Reference.
Acceso en línea:	Texto completo

Tabla de Contenidos:

Cover
Copyright
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Knowing Hadoop and Clustering Basics
Understanding the need for Hadoop
Apache Hive
Apache Pig
Apache HBase
Apache HCatalog
Understanding HDFS design
Getting familiar with HDFS daemons
Scenario 1 â€? writing data to the HDFS cluster
Scenario 2 â€? reading data from the HDFS cluster
Understanding the basics of Hadoop cluster
Summary
Chapter 2: Understanding Hadoop Backup and Recovery NeedsUnderstanding the backup and recovery philosophies
Replication of data using DistCp
Updating and overwriting using DistCp
The backup philosophy
Changes since the last backup
The rate of new data arrival
The size of the cluster
Priority of the datasets
Selecting the datasets or parts of datasets
The timelines of data backups
Reducing the window of possible data loss
Backup consistency
Avoiding invalid backups
The recovery philosophy
Knowing the necessity of backing up HadoopDetermining backup areas â€? what should I back up?
Datasets
Block size â€? a large file divided into blocks
Replication factor
A list of all the blocks of a file
A list of DataNodes for each block â€? sorted by distance
The ACK package
The checksums
The number of under-replicated blocks
The secondary NameNode
Active and passive nodes in second generation Hadoop
Hardware failure
Software failure
Applications
Configurations
Is taking backup enough?
Understanding the disaster recovery principleKnowing a disaster
The need for recovery
Understanding recovery areas
Summary
Chapter 3: Determining Backup Strategies
Knowing the areas to be protected
Understanding the common failure types
Hardware failure
Host failure
Using commodity hardware
Hardware failures may lead to loss of data
User application failure
Software causing task failure
Failure of slow-running tasks
Hadoop's handling of failing tasks
Task failure due to data
Bad data handling â€? through codeHadoop's skip mode
Learning a way to define the backup strategy
Why do I need a strategy?
What should be considered in a strategy?
Filesystem check (fsck)
Filesystem balancer
Upgrading your Hadoop cluster
Designing network layout and rack awareness
Most important areas to consider while defining a backup strategy
Understanding the need for backing up Hive metadata
What is Hive?
Hive replication
Summary
Chapter 4: Backing Up Hadoop
Data backup in Hadoop
Distributed copy

Hadoop backup and recovery solutions : learn the best strategies for data recovery from Hadoop backup clusters and troubleshoot problems /

Ejemplares similares