May 19, 20268 min

Devin auto-triage and the always-on oncall

Devin's auto-triage monitors Slack, Sentry, and PagerDuty around the clock and investigates incidents on its own. Here's what the architecture reveals.

Cognition announced auto-triage yesterday, and it's the most complete production example of a proactive agent I've seen built specifically for incident response. Not a chatbot you ask about errors. Not a cron job that summarizes your alerts every morning. An always-on agent that watches your Slack channels, your Sentry dashboard, your PagerDuty incidents, and your Linear backlog simultaneously, investigates what it finds, and either opens a pull request or posts an investigation summary with next steps.

I've been writing about what it takes to make agents proactive for weeks now. Most of the products I've covered have one or two of the three primitives wired up. Pulse has a clock but barely a listener. CodeRabbit has a strong listener surface but runs event detection on a thirty-minute cron. Auto-triage claims all three, and the architecture underneath is worth taking apart.

Seven signal sources converging into an investigation pipelinetriageSlackLinearGitHubSentryDatadogPDhooksPRsummaryseven sources → one pipeline

Seven signal sources, one investigation pipeline.

The broadest listener surface on a coding agent

The signal surface is what caught my attention first. Auto-triage connects to Slack, Linear, GitHub, Sentry, Datadog, PagerDuty, and custom webhooks. When a bug report lands in a Slack channel, or a Sentry alert fires, or a Linear ticket transitions to a "Bug" label, auto-triage picks it up and starts investigating.

That's seven input sources, each with its own payload format, its own authentication model, its own delivery semantics. We wrote about the webhook tax earlier in this series: adding one webhook provider to a proactive agent takes roughly a sprint. Four providers takes most of a quarter. Seven is a serious infrastructure investment, and it explains why most agent products settle for one or two integrations at launch.

The investigation flow, based on Cognition's documentation, works something like this: a bug report comes in ("Pro users seeing 'undefined' on the billing page since Friday's deploy"), and auto-triage pulls error logs from Datadog, queries the read replica to verify data state, traces the breaking commit in git history, writes a fix with a regression test, and opens a PR. When the fix isn't obvious, it posts an investigation summary instead.

The listener here is genuinely always-on, not scheduled. Cognition explicitly contrasts this with webhook-triggered automation (an implicit comparison to Cursor's recently launched automations), positioning auto-triage as a persistent agent that maintains context rather than a stateless function that fires and forgets.

Seven signal sources converging into an investigation pipelinetriageSlackLinearGitHubSentryDatadogPDhooksPRsummaryseven sources → one pipeline

Seven signal sources, one investigation pipeline.

Manager agent holds context while spawning sub-agents for parallel investigationmanagermemorysub-1sandboxsub-2sandboxsub-3sandboxcoordinate once, investigate in parallel

A manager agent holds context while sub-agents investigate in parallel.

The manager and the fleet

The architecture underneath auto-triage uses what Cognition calls a manager-agent pattern. One agent maintains long-running context across all investigations: what incidents have been seen before, which ones are related, how the team typically handles certain categories of bugs, who owns which services. When a new signal arrives, the manager agent can spin up sub-Devins to investigate in parallel, each working in its own sandboxed environment.

This is a different model from a single-agent loop. A single agent handling triage would have to serialize investigations, context-switching between incidents and holding everything in one increasingly long conversation. The manager pattern separates coordination from execution. The manager remembers; the sub-agents do the digging.

The long-term memory is the piece that interests me most. Auto-triage remembers prior findings, recurring issues, and how your team handled specific bug categories. When the same class of error appears again, the agent already knows where to look and who to assign. Over time, the routing should get better as the system learns team preferences.

This memory architecture addresses something I wrote about in what makes proactive agents hard: the cold-start problem across runs. Most agents forget everything between sessions. They wake up, do their work, and lose all context. An agent that accumulates institutional knowledge about your incidents, your codebase, and your team's preferences is qualitatively different from one that cold-starts every time a Sentry alert fires.

Whether the memory actually works well at scale is a different question. The gap between "remembers prior findings" in a product description and "reliably correlates a Tuesday Slack thread with a Thursday Sentry alert about the same root cause" in production is wide. But the architectural commitment is right.

Manager agent holds context while spawning sub-agents for parallel investigationmanagermemorysub-1sandboxsub-2sandboxsub-3sandboxcoordinate once, investigate in parallel

A manager agent holds context while sub-agents investigate in parallel.

A triage playbook encodes investigation steps that bound the agent's judgmentplaybook1. search codebase2. check git history3. pull error logs4. verify data state5. trace breaking commit6. write fix + testagentboundedPRassignreportjudgment bounded by process

A triage playbook encodes investigation steps, not just a bug label.

Playbooks as a judgment mechanism

The detail that surprised me is how teams teach auto-triage what to do. In the Linear integration, you don't just point the agent at a bug label and say "fix it." You write a triage playbook: a concrete sequence of investigation steps that describes how a human engineer would approach the problem. Search the codebase for relevant files, check git history for recent changes, look at error patterns in the monitoring tool, verify the data state, and so on.

The playbook isn't a system prompt. It's an operational runbook, the kind of document that experienced oncall engineers keep in their heads or buried in a Confluence page. By making it explicit and machine-readable, teams are encoding their institutional triage knowledge into a format the agent can follow.

From the documentation, the workflow uses edge detection on Linear label transitions, so it only fires on newly triaged tickets rather than retroactively processing the entire backlog. Playbooks can be chained: a "Clear Fix" label can trigger a separate fix playbook. The investigation results sync back to the Linear ticket as structured output.

The playbook pattern also explains why Cognition positions auto-triage as learning over time. The playbooks are a starting point, but the memory layer accumulates which steps actually led to successful resolutions and which were dead ends. In theory, the investigation gets sharper with each incident.

A triage playbook encodes investigation steps that bound the agent's judgmentplaybook1. search codebase2. check git history3. pull error logs4. verify data state5. trace breaking commit6. write fix + testagentboundedPRassignreportjudgment bounded by process

A triage playbook encodes investigation steps, not just a bug label.

The cost question

Always-on monitoring means always spending. Devin's pricing starts at $20/month for the Pro tier, which includes a usage quota measured in ACUs (Agentic Computing Units, roughly 15 minutes of active autonomous work per unit). Pay-as-you-go rates run $2.00–2.25 per ACU depending on plan. Multiple reviewers report that actual monthly costs climb to $300–500 once the agent is actively investigating incidents.

We covered the economics of always-on agents in what proactive agents actually cost. The pattern is consistent: the headline price gets you in the door, and the metered usage is where the real bill lives. For auto-triage specifically, every investigation spins up compute, model inference, and networking. A team with high alert volume could burn through the included quota in a few days.

There's also what developers are calling the "babysitting tax." Reviews of Devin generally report 10–20 minutes of overhead per task for prompt crafting, session monitoring, and reviewing output. The code Devin generates has a reported defect rate 1.5–2x higher than senior-developer-authored code. Automation Atlas rates Devin 7.5 out of 10, strong for well-scoped repetitive tasks but weaker for ambiguous product work.

For triage specifically, the economics could work differently than for general coding. A well-scoped investigation is closer to Devin's sweet spot than an open-ended feature build. If the playbook bounds the investigation steps and the agent consistently surfaces useful summaries, the cost of automated first-response could compare favorably to the cost of pulling a senior engineer off their current work to look at every alert.

Where auto-triage sits in the landscape

Through the three primitives:

Auto-triageCodeRabbit AgentJunior (Sentry)
ClockAlways-on, continuous30-min cronOn @-mention
ListenerSlack, Linear, GitHub, Sentry, Datadog, PagerDuty, webhooksSlack + 12 tool integrationsSlack + MCP plugins
InboxSlack threads, Linear tickets, GitHub PRsSlack threads, PR commentsSlack threads
MemoryPersistent across investigationsPer-repo learningsMarkdown skills (no persistent memory)
ScopeIncident triage + fixCode review + expandingGeneral-purpose

Auto-triage has the most complete primitive coverage of any coding agent I've analyzed in this series. The combination of continuous monitoring, broad signal surface, persistent memory, and bounded investigation through playbooks is architecturally sound. CodeRabbit's agent has breadth across tools but still runs detection on a scheduled loop. Junior is composable and open-source but reactive by design, responding to @-mentions rather than watching for incidents on its own.

The real test is whether the memory and deduplication hold up under production alert volume. An oncall engineer processing fifty alerts a week builds up contextual knowledge that's difficult to replicate in a language model: which services are flaky, which alerts are noise, which error patterns indicate a real regression versus a transient blip. Auto-triage's long-term memory is an attempt to accumulate that knowledge programmatically. The architecture is pointed in the right direction. Whether it gets there is something we'll only know from teams running it for months, not days.

Russell Kaplan, Cognition's co-founder, framed auto-triage as part of a broader shift: "Having to manually prompt your coding agent to do work will soon feel like a UX bug." I think he's right about the direction. The harder question is whether any specific agent has accumulated enough context about your systems, your team, and your incident patterns to earn that always-on responsibility. Auto-triage is the most serious attempt I've seen so far. Ask me again after a team has run it through a full quarter of production alerts.

✦ Newsletter

Liked this essay?

Get the next one in your inbox. One email per essay, no spam.

Posted May 19, 2026 · AgentWorkforce

Issues, PRs, and arguments welcome on GitHub. Or email [email protected].