Fault-Tolerance Techniques for High-Performance Computing
This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correcti...
Clasificación: | Libro Electrónico |
---|---|
Autor Corporativo: | |
Otros Autores: | , |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Cham :
Springer International Publishing : Imprint: Springer,
2015.
|
Edición: | 1st ed. 2015. |
Colección: | Computer Communications and Networks,
|
Temas: | |
Acceso en línea: | Texto Completo |
Tabla de Contenidos:
- Part I: General Overview
- Fault-Tolerance Techniques for High-Performance Computing
- Part II: Technical Contributions
- Errors and Faults
- Fault-Tolerant MPI
- Using Replication for Resilience on Exascale Systems
- Energy-Aware Check pointing Strategies.