Skip to content

Incidents

The incidents page is your primary operational view. It shows active incidents (not yet resolved) and recent incidents from the past 7 days.

Incident card

Each incident card shows:

Field Description
Confidence How confident the LLM is in the hypothesis (0–100%)
Service The service the incident is associated with
Time When the incident was triggered (relative, hover for exact timestamp)
Hypothesis The LLM-generated explanation of what caused the incident
Recommended action The suggested remediation step

Confirming or rejecting a diagnosis

Each active incident has two feedback buttons:

  • Confirm — marks the diagnosis as correct. This stores the causal pattern in the vector store to improve future correlations.
  • Wrong — marks the diagnosis as incorrect. Useful signal for improving model accuracy over time.

Incident detail

Click Details on any incident card to see the full context:

  • Complete evidence trail (all ContextEvents that contributed)
  • Raw signal details (SHA, entity, old/new values)
  • LLM reasoning
  • Timeline view

Filtering

Use the ?service=service-name query parameter to filter incidents by service. This is linked automatically from the Services page.

Status indicators

Badge Meaning
Active Not yet resolved — engineer action may be needed
Resolved resolved_at timestamp set, either manually or via PagerDuty webhook
Confirmed Engineer marked the diagnosis correct
Marked wrong Engineer marked the diagnosis incorrect