Cargando…

Hadoop backup and recovery solutions : learn the best strategies for data recovery from Hadoop backup clusters and troubleshoot problems /

If you are a Hadoop administrator and you want to get a good grounding in how to back up large amounts of data and manage Hadoop clusters, then this book is for you.

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Barot, Gaurav (Autor), Patel, Amij (Autor), Mehta, Chintan (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, UK : Packt Publishing, 2015.
Colección:Community experience distilled.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Authors
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Knowing Hadoop and Clustering Basics
  • Understanding the need for Hadoop
  • Apache Hive
  • Apache Pig
  • Apache HBase
  • Apache HCatalog
  • Understanding HDFS design
  • Getting familiar with HDFS daemons
  • Scenario 1 â€? writing data to the HDFS cluster
  • Scenario 2 â€? reading data from the HDFS cluster
  • Understanding the basics of Hadoop cluster
  • Summary
  • Chapter 2: Understanding Hadoop Backup and Recovery NeedsUnderstanding the backup and recovery philosophies
  • Replication of data using DistCp
  • Updating and overwriting using DistCp
  • The backup philosophy
  • Changes since the last backup
  • The rate of new data arrival
  • The size of the cluster
  • Priority of the datasets
  • Selecting the datasets or parts of datasets
  • The timelines of data backups
  • Reducing the window of possible data loss
  • Backup consistency
  • Avoiding invalid backups
  • The recovery philosophy
  • Knowing the necessity of backing up HadoopDetermining backup areas â€? what should I back up?
  • Datasets
  • Block size â€? a large file divided into blocks
  • Replication factor
  • A list of all the blocks of a file
  • A list of DataNodes for each block â€? sorted by distance
  • The ACK package
  • The checksums
  • The number of under-replicated blocks
  • The secondary NameNode
  • Active and passive nodes in second generation Hadoop
  • Hardware failure
  • Software failure
  • Applications
  • Configurations
  • Is taking backup enough?
  • Understanding the disaster recovery principleKnowing a disaster
  • The need for recovery
  • Understanding recovery areas
  • Summary
  • Chapter 3: Determining Backup Strategies
  • Knowing the areas to be protected
  • Understanding the common failure types
  • Hardware failure
  • Host failure
  • Using commodity hardware
  • Hardware failures may lead to loss of data
  • User application failure
  • Software causing task failure
  • Failure of slow-running tasks
  • Hadoop's handling of failing tasks
  • Task failure due to data
  • Bad data handling â€? through codeHadoop's skip mode
  • Learning a way to define the backup strategy
  • Why do I need a strategy?
  • What should be considered in a strategy?
  • Filesystem check (fsck)
  • Filesystem balancer
  • Upgrading your Hadoop cluster
  • Designing network layout and rack awareness
  • Most important areas to consider while defining a backup strategy
  • Understanding the need for backing up Hive metadata
  • What is Hive?
  • Hive replication
  • Summary
  • Chapter 4: Backing Up Hadoop
  • Data backup in Hadoop
  • Distributed copy