Open SWE and the architecture everyone built twice

Open SWE is an open-source framework for building background coding agents, created by LangChain and built on LangGraph and Deep Agents. Launched in March 2026, it has crossed 10,000 stars and 1,100 forks on GitHub. MIT license, actively maintained, with over 900 commits. It's the most popular open-source background agent by a wide margin.

The project exists because three of the most respected engineering organizations in fintech built internal coding agents in the past year. Stripe built Minions. Ramp built Inspect. Coinbase built Cloudbot. They worked independently, shared nothing, and arrived at almost exactly the same architecture: cloud sandboxes, curated toolsets, Slack-first invocation, and subagent orchestration. Open SWE extracts that convergence into a framework anyone can fork.

The interesting part is which architectural decisions open-swe chose to encode, and which it deliberately left out.

Composition over forking: the open-swe layer stack.↗

Composing, not building from scratch

The first interesting decision is how open-swe relates to its underlying tools. It doesn't fork an existing agent CLI the way Stripe forked Goose (Block's open-source agent framework), and it doesn't start from raw model calls. Instead it composes on LangGraph for durable execution and Deep Agents for the agent harness.

The practical implication: when Deep Agents improves its context management or token efficiency, open-swe inherits those improvements without a merge conflict. When LangGraph ships better streaming or cancellation, open-swe gets that too. Ramp's Inspect team described the same benefit when they built on OpenCode rather than starting fresh.

The alternative is what Coinbase did: build from scratch. That gives you total control but locks you into maintaining everything from file I/O tools to token window management to conversation compaction. Open-swe's bet is that the foundational agent loop is commodity infrastructure. The value is in what you build on top: the triggers, the tools, the middleware, the safety boundaries.

This is the same architectural split we covered in the background agents essay: the runtime layer is converging, and the differentiation is moving up the stack toward orchestration, verification, and proactive triggers.

Every task runs in its own isolated cloud sandbox.↗

The sandbox-first principle

Every task in open-swe runs inside an isolated cloud sandbox: a remote Linux environment with full shell access, a cloned copy of the repository, and no connection to production infrastructure. The blast radius of any mistake stays inside the sandbox walls.

This is the decision all three internal agents converged on independently. Stripe runs agents in pre-warmed EC2 devboxes. Ramp uses Modal containers. Coinbase built their sandbox in-house. Open-swe makes the sandbox backend pluggable and ships adapters for Modal, Daytona, Runloop, and LangSmith. Swap one line of configuration and the agent runs in a different cloud.

Each conversation thread gets its own persistent sandbox. If you send a follow-up message in Slack, the agent picks it up in the same environment with all the files and state from the previous turn still in place. If a sandbox becomes unreachable, the system recreates it automatically and re-clones the repo.

The parallel execution story matters too. Multiple tasks run simultaneously, each in its own sandbox. There's no shared queue that serializes work. A team can assign ten issues in Linear and the agent spins up ten sandboxes in parallel, working on all of them at once.

Open-swe ships the base layer. Production agents need the rest.↗

What open-swe leaves to you

The architecture open-swe encodes is real and well-chosen. But the framework is explicitly a foundation, not a finished product. The README says as much: "meant to be forked." The gaps between open-swe and the internal agents it models are where the hardest engineering lives.

Invocation is reactive. Open-swe responds to @mentions in Slack, Linear comments, and GitHub PR comments. There's no channel watching, no board monitoring, no scheduled wake-ups. Everything starts with a human explicitly pinging the bot. The proactive trigger layer with its clock, listener, and inbox primitives isn't part of the framework.

Validation is prompt-driven. The agent is instructed to run linters, formatters, and tests before committing. But there's no CI feedback loop where test failures route back to the agent for a retry. There's no verification gate that blocks a PR until checks pass. Stripe's Minions run three layers of validation: local checks, full CI, and one automatic retry on failure. Open-swe relies on the model following instructions, which works most of the time and fails silently when it doesn't.

No batch management. When ten issues land at once, open-swe spins up ten independent sandboxes. There's no triage layer that reads all ten, identifies dependencies between them, prioritizes by urgency, or detects that three of them describe the same underlying bug. Each task is an island.

No bidirectional clarification. The middleware can inject follow-up messages from a human into the agent loop. But the agent can't reach out proactively when it gets stuck. It can't ask a clarifying question in Slack and pause until the human responds. The communication is one-directional: human pushes context to the agent, but the agent doesn't pull context from the human.

These aren't oversights. They're scope choices. Open-swe ships the 60% of the architecture that's well-understood and lets teams build the remaining 40% specific to their organization, their risk tolerance, and their definition of "done."

Fifteen tools, not five hundred

Stripe's internal coding agents have access to roughly 500 tools. Open-swe ships about 15. The gap is deliberate. A smaller, well-tested toolset is easier to maintain, easier for the model to reason about, and produces more predictable results.

The core set is tight: execute for shell commands, fetch_url and http_request for web access, linear_comment and slack_thread_reply for communication, and the Deep Agents built-ins for file operations and subagent spawning. GitHub operations run through gh CLI inside the sandbox rather than through a dedicated tool, which keeps the tool surface small while still giving the agent full git access.

Optional add-ons include Datadog integration for observability and Corridor for plan analysis, both running server-side so that credentials never enter the sandbox. The security boundary is clear: the sandbox has network egress and repository access; observability keys and admin secrets stay in the server process.

The curation question scales differently depending on team size. Fifteen tools work well for a framework that ships to thousands of forks. Five hundred tools work for an internal agent at Stripe where a dedicated platform team maintains and tests each one. Open-swe's bet is that most teams are closer to fifteen than five hundred, and that adding tools is easier than removing them.

Context engineering through AGENTS.md

Open-swe's context model has two layers. The first is an AGENTS.md file at the repo root, analogous to the rule files that Stripe uses to encode coding conventions, testing requirements, and architectural constraints. If the file exists, it's read from the sandbox and injected into the system prompt. The second is source context: the full Linear issue, Slack thread history, or GitHub PR conversation, assembled and passed to the agent so it starts with rich context rather than discovering everything through tool calls.

The middleware layer adds a third dimension. check_message_queue_before_model runs before every model call and injects any follow-up messages that arrived while the agent was working. You can send a Slack reply to the agent mid-task ("actually, target the v2 branch instead") and it picks up the correction at its next step. This is a meaningful difference from fire-and-forget agents that ignore everything after the initial prompt.

The combination gives you something closer to a real engineering conversation than a one-shot prompt: the agent has institutional context from AGENTS.md, task-specific context from the ticket, and live corrections from the thread. Whether that's enough depends on how much unwritten context your codebase requires.

Where open-swe sits in the open-source landscape

Open-swe isn't the only open-source project in this space. Runtime (YC S26) takes a different approach: an agent-agnostic control plane that wraps Claude Code, Cursor, Codex, and others inside governed sandboxes. Runtime ships proactive Slack channel-watching, spend caps, and approval gates. Its AGPLv3 license on the API layer creates friction for enterprise forks that MIT doesn't.

Mistle (also YC-backed) focuses on credential brokering and identity attribution for agent work. Deputies builds a control plane with swappable sandbox providers. ColeMurray's Open-Inspect is the most architecturally similar to open-swe, running a cloud-native stack on Cloudflare Workers with Modal sandboxes, cron triggers, and a triage classifier.

Project	Stars	License	Triggers	Sandbox	Differentiator
open-swe	10k	MIT	Reactive (@mention)	Modal, Daytona, Runloop	LangChain ecosystem, composition
Open-Inspect	2k	MIT	Reactive + cron + Sentry	Modal, Daytona, Vercel	Broadest trigger surface
Runtime	YC S26	AGPL	Proactive Slack watch	Agent-agnostic (wraps any CLI)	Governance and spend caps
Mistle	66	MIT	Reactive + cron	Configurable	Credential brokering

Each one ships the base layer and leaves the proactive, verification, and orchestration work to the team deploying it.

None of them ship a complete factory model yet. The agent that watches your board, triages incoming work, runs tasks through verification, and handles failures gracefully, all without anyone @mentioning it first, remains something teams build in-house. Open-swe gives you the strongest open-source starting point for that build. How much of the remaining architecture you need to customize depends on how close your workflow is to the Slack-plus-Linear-plus-GitHub pattern that open-swe optimizes for.

✦ Newsletter

Liked this essay?

Get the next one in your inbox. One email per essay, no spam.

Posted June 19, 2026 · AgentWorkforce

Issues, PRs, and arguments welcome on GitHub. Or email [email protected].