Your First Incident¶

Once the agent is running and events are flowing, this is what the full lifecycle looks like.

1. Events arrive¶

The agent processes your OTel log streams and emits ContextEvents for each signal it detects. In the dashboard under Agents, you can see the event stream for each agent:

02:10:41  deploy      argocd     payments-api    sha=a4f2c1    confidence=0.97
02:12:03  restart     kubernetes auth-service    OOMKilled     confidence=0.90
02:14:33  saturation  postgres   payments.public deadlock      confidence=0.88

2. State window builds¶

Each service maintains a rolling state window in Noctuary — a time-ordered list of recent events with TTLs. When multiple events correlate (a deploy followed by OOMKills followed by a deadlock), the pattern is stored and ready to explain the next alert.

3. PagerDuty fires¶

Your existing alert fires as normal. Noctuary receives the webhook from PagerDuty and:

Identifies which service the alert is for
Fetches the recent state window for that service
Calls the LLM with only the pre-correlated context (not raw logs)
Posts the enriched note back to the PagerDuty incident

Total time from alert to hypothesis: under 10 seconds.

4. Engineer investigates¶

The engineer opens the PagerDuty incident and sees:

Noctuary Context

Hypothesis: Deploy sha=a4f2c1 to payments-api rolled out 4 minutes before onset

Confidence: 94%

Timeline: - 02:10:41 — deploy a4f2c1 (argocd) - 02:12:03 — OOMKill on auth-service (kubernetes) - 02:14:33 — deadlock on payments.public (postgres)

Recommended action: Roll back to sha=f8b3e22

Instead of 10–15 minutes of log hunting, the engineer has a specific hypothesis in seconds — and can either act on it immediately or rule it out and narrow the investigation.

5. Confirm or reject¶

After investigating, the engineer marks the diagnosis as Confirmed or Wrong in the Noctuary dashboard. Confirmed diagnoses are stored in the causal vector store and improve future correlations for similar patterns.