|
|
|
|
LEADER |
00000nam a22000005i 4500 |
001 |
978-3-319-20943-2 |
003 |
DE-He213 |
005 |
20220118230051.0 |
007 |
cr nn 008mamaa |
008 |
150701s2015 sz | s |||| 0|eng d |
020 |
|
|
|a 9783319209432
|9 978-3-319-20943-2
|
024 |
7 |
|
|a 10.1007/978-3-319-20943-2
|2 doi
|
050 |
|
4 |
|a QA76.9.E94
|
072 |
|
7 |
|a UYD
|2 bicssc
|
072 |
|
7 |
|a COM074000
|2 bisacsh
|
072 |
|
7 |
|a UYD
|2 thema
|
082 |
0 |
4 |
|a 004.24
|2 23
|
245 |
1 |
0 |
|a Fault-Tolerance Techniques for High-Performance Computing
|h [electronic resource] /
|c edited by Thomas Herault, Yves Robert.
|
250 |
|
|
|a 1st ed. 2015.
|
264 |
|
1 |
|a Cham :
|b Springer International Publishing :
|b Imprint: Springer,
|c 2015.
|
300 |
|
|
|a IX, 320 p. 113 illus.
|b online resource.
|
336 |
|
|
|a text
|b txt
|2 rdacontent
|
337 |
|
|
|a computer
|b c
|2 rdamedia
|
338 |
|
|
|a online resource
|b cr
|2 rdacarrier
|
347 |
|
|
|a text file
|b PDF
|2 rda
|
490 |
1 |
|
|a Computer Communications and Networks,
|x 2197-8433
|
505 |
0 |
|
|a Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
|
520 |
|
|
|a This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.
|
650 |
|
0 |
|a Electronic digital computers-Evaluation.
|
650 |
|
0 |
|a Computers.
|
650 |
|
0 |
|a Numerical analysis.
|
650 |
1 |
4 |
|a System Performance and Evaluation.
|
650 |
2 |
4 |
|a Hardware Performance and Reliability.
|
650 |
2 |
4 |
|a Numerical Analysis.
|
700 |
1 |
|
|a Herault, Thomas.
|e editor.
|4 edt
|4 http://id.loc.gov/vocabulary/relators/edt
|
700 |
1 |
|
|a Robert, Yves.
|e editor.
|4 edt
|4 http://id.loc.gov/vocabulary/relators/edt
|
710 |
2 |
|
|a SpringerLink (Online service)
|
773 |
0 |
|
|t Springer Nature eBook
|
776 |
0 |
8 |
|i Printed edition:
|z 9783319209449
|
776 |
0 |
8 |
|i Printed edition:
|z 9783319209425
|
776 |
0 |
8 |
|i Printed edition:
|z 9783319355603
|
830 |
|
0 |
|a Computer Communications and Networks,
|x 2197-8433
|
856 |
4 |
0 |
|u https://doi.uam.elogim.com/10.1007/978-3-319-20943-2
|z Texto Completo
|
912 |
|
|
|a ZDB-2-SCS
|
912 |
|
|
|a ZDB-2-SXCS
|
950 |
|
|
|a Computer Science (SpringerNature-11645)
|
950 |
|
|
|a Computer Science (R0) (SpringerNature-43710)
|