Incidents¶
The incidents page is your primary operational view. It shows active incidents (not yet resolved) and recent incidents from the past 7 days.
Incident card¶
Each incident card shows:
| Field | Description |
|---|---|
| Confidence | How confident the LLM is in the hypothesis (0–100%) |
| Service | The service the incident is associated with |
| Time | When the incident was triggered (relative, hover for exact timestamp) |
| Hypothesis | The LLM-generated explanation of what caused the incident |
| Recommended action | The suggested remediation step |
Confirming or rejecting a diagnosis¶
Each active incident has two feedback buttons:
- Confirm — marks the diagnosis as correct. This stores the causal pattern in the vector store to improve future correlations.
- Wrong — marks the diagnosis as incorrect. Useful signal for improving model accuracy over time.
Incident detail¶
Click Details on any incident card to see the full context:
- Complete evidence trail (all ContextEvents that contributed)
- Raw signal details (SHA, entity, old/new values)
- LLM reasoning
- Timeline view
Filtering¶
Use the ?service=service-name query parameter to filter incidents by service. This is linked automatically from the Services page.
Status indicators¶
| Badge | Meaning |
|---|---|
| Active | Not yet resolved — engineer action may be needed |
| Resolved | resolved_at timestamp set, either manually or via PagerDuty webhook |
| Confirmed | Engineer marked the diagnosis correct |
| Marked wrong | Engineer marked the diagnosis incorrect |