observability/

Observability

Metrics, traces, logging, SLIs/SLOs, and SRE operational practices.

5 topics in this section
observability/incident-response/
Incident Response
Incident severity classification, on-call rotation, postmortem culture, and runbook automation.
observability/logs/
Logging
Structured logging, log levels, correlation IDs, log aggregation (ELK/OpenSearch), and retention policies.
observability/metrics/
Metrics
RED/USE metrics methodology, custom metrics, Prometheus + Grafana, and metric cardinality management.
observability/sli-slo/
SLI/SLO/SLA
Defining SLIs, setting SLO targets, error budget policy, and SLA contractual alignment.
observability/traces/
Distributed Tracing
OpenTelemetry instrumentation, trace sampling strategies, Jaeger/Zipkin, and trace-to-log correlation.