Cargando…

Incident Metrics in SRE

Site reliability engineers often use MTTx metrics to evaluate improvements or track trends. But is either MTTR ( mean time to recovery ) or MTTM ( mean time to mitigation ) ideal for decision making or trend analysis when it comes to production incidents? This report not only demonstrates how and wh...

Descripción completa

Detalles Bibliográficos
Autor principal: Davidovič, Štěpán (Autor)
Autor Corporativo: Safari, an O'Reilly Media Company
Formato: Electrónico eBook
Idioma:Inglés
Publicado: O'Reilly Media, Inc., 2021.
Edición:1st edition.
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cam a22000007a 4500
001 OR_on1246197237
003 OCoLC
005 20231017213018.0
006 m o d
007 cr cnu||||||||
008 190321s2021 xx o 000 0 eng
040 |a AU@  |b eng  |c AU@  |d TAC  |d OCLCQ  |d TOH  |d OCLCQ 
019 |a 1302704154  |a 1355685088 
020 |z 9781098103156 
024 8 |a 9781098103163 
029 0 |a AU@  |b 000068941572 
029 1 |a AU@  |b 000073549210 
035 |a (OCoLC)1246197237  |z (OCoLC)1302704154  |z (OCoLC)1355685088 
049 |a UAMI 
100 1 |a Davidovič, Štěpán,  |e author. 
245 1 0 |a Incident Metrics in SRE  |h [electronic resource] /  |c Davidovič, Štěpán. 
250 |a 1st edition. 
264 1 |b O'Reilly Media, Inc.,  |c 2021. 
300 |a 1 online resource (34 pages) 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
347 |a text file 
520 |a Site reliability engineers often use MTTx metrics to evaluate improvements or track trends. But is either MTTR ( mean time to recovery ) or MTTM ( mean time to mitigation ) ideal for decision making or trend analysis when it comes to production incidents? This report not only demonstrates how and why MTTx metrics come up short but also proposes ways to think about metrics differently to get the answers you want. Google SRE Štěpán Davidovič uses a Monte Carlo simulation to show you how poorly MTTx metrics perform with production incidents. Applying these metrics is trickier than it seems and can be dangerously misleading in many practical scenarios. With this report, you'll explore alternative methods for achieving these measurements. Work with a simple model of the incident lifecycle and timings using empirical datasets Use an analytical approach to get a clear picture of what your incident durations look like Focus on narrow questions of the incident lifecycle rather than analyze incident statistics using MTTx Explore alternative methods for achieving your measurements. 
542 |f Copyright © O'Reilly Media, Inc. 
550 |a Made available through: Safari, an O'Reilly Media Company. 
588 |a Online resource; Title from title page (viewed March 25, 2021) 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
710 2 |a Safari, an O'Reilly Media Company. 
856 4 0 |u https://learning.oreilly.com/library/view/~/9781098103163/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
936 |a BATCHLOAD 
994 |a 92  |b IZTAP