I keep asking people what they actually want from an AI agent. When you press on it, the answer is usually some version of the same thing: an amazing intern. Someone who shows up, remembers what you mentioned at lunch yesterday, checks your calendar before asking when you're free, and drafts the follow-up email before you think to ask. The feeling — someone proactively handling the things that drain your day — is what people are reaching for when they describe the agent they wish they had.
Most teams try to create that feeling with a better system prompt.
The magical intern sees the world before you describe it.
The intern analogy, made concrete
The magical intern isn't smarter than you. They're just watching when you're not. They notice when something changes and they act before you have to ask. Here's what that looks like across three domains most teams already work in.
Communication. A Gmail agent that drafts replies to messages that have gone unanswered for 48 hours. A Slack agent that watches channels you own and surfaces questions needing your response in a daily DM digest. A Calendar agent that spots a conflict between two meetings and proposes which one to move — before either organizer has to ping you.
Development. A GitHub agent that detects when a PR has been sitting in review for three days and nudges the reviewer with context about what changed. A Linear agent that sees a ticket marked "blocked" and checks whether the blocking ticket was resolved yesterday.
Business. A CRM agent that notices a deal has gone quiet for a week and drafts a check-in email with the relevant context from the last call. A Notion agent that watches your meeting notes database and extracts action items into your task board the same afternoon.
Each of these is technically possible with today's APIs and today's models. GPT-4-class reasoning can handle every judgment call on this list. The reason most of them don't exist is not capability.
Every magical example maps to the same three primitives.
The infrastructure under the magic
Here's the exercise that made the pattern obvious for us. Take each example from above and ask three questions: how does the agent know when to wake up? Where does it keep what it learned last time? Where does it deliver the result?
| Agent | Clock | Listener | Inbox |
|---|---|---|---|
| Gmail reply drafter | 48h timer | New reply events | Draft in Gmail |
| Slack question surfacer | Daily digest | Channel messages | DM to owner |
| PR review nudger | 3-day threshold | PR status changes | Comment on PR |
| CRM deal checker | Weekly scan | Deal stage changes | Draft email |
| Meeting notes extractor | Post-meeting | New page in DB | Task board items |
The table is monotonous on purpose. Every row needs the same three primitives. The agent-specific logic for each of these is small — a handler that makes a judgment call. The engineering investment goes into the infrastructure underneath: the change-event pipeline from each provider, the scheduling that fires reliably, the delivery channel that puts results where people actually look.
The model can absolutely reason about a stale PR or draft a check-in email. What it can't do is wake itself up when a PR goes stale, notice that a deal went quiet, or deliver an action item to a task board. Those are all infrastructure problems, and I keep running into the same ones.
What "magical" actually means
The word "magical" does a ton of work in the AI discourse. Usually it just means "I was surprised it worked." For proactive agents, I think the better word is reliable. A reliable intern feels magical because they show up consistently, remember context, and deliver results to the right place at the right time.
For agents, that means scheduling, change detection, and delivery, wired together with durable state. I've found that once you get those right, the agent logic itself is honestly pretty small.
Posted May 11, 2026· AgentWorkforce
Issues, PRs, and arguments welcome on GitHub. Or email [email protected].