Your First Incident¶
Once the agent is running and events are flowing, this is what the full lifecycle looks like.
1. Events arrive¶
The agent processes your OTel log streams and emits ContextEvents for each signal it detects. In the dashboard under Agents, you can see the event stream for each agent:
02:10:41 deploy argocd payments-api sha=a4f2c1 confidence=0.97
02:12:03 restart kubernetes auth-service OOMKilled confidence=0.90
02:14:33 saturation postgres payments.public deadlock confidence=0.88
2. State window builds¶
Each service maintains a rolling state window in Noctuary — a time-ordered list of recent events with TTLs. When multiple events correlate (a deploy followed by OOMKills followed by a deadlock), the pattern is stored and ready to explain the next alert.
3. PagerDuty fires¶
Your existing alert fires as normal. Noctuary receives the webhook from PagerDuty and:
- Identifies which service the alert is for
- Fetches the recent state window for that service
- Calls the LLM with only the pre-correlated context (not raw logs)
- Posts the enriched note back to the PagerDuty incident
Total time from alert to hypothesis: under 10 seconds.
4. Engineer investigates¶
The engineer opens the PagerDuty incident and sees:
Noctuary Context
Hypothesis: Deploy sha=a4f2c1 to payments-api rolled out 4 minutes before onset
Confidence: 94%
Timeline:
- 02:10:41 — deploy a4f2c1 (argocd)
- 02:12:03 — OOMKill on auth-service (kubernetes)
- 02:14:33 — deadlock on payments.public (postgres)
Recommended action: Roll back to sha=f8b3e22
Instead of 10–15 minutes of log hunting, the engineer has a specific hypothesis in seconds — and can either act on it immediately or rule it out and narrow the investigation.
5. Confirm or reject¶
After investigating, the engineer marks the diagnosis as Confirmed or Wrong in the Noctuary dashboard. Confirmed diagnoses are stored in the causal vector store and improve future correlations for similar patterns.