Know Your Surface: A Builder's Guide to Designing AI Agents Across Interfaces

This post was written in collaboration with Anthropic's Opus 4.8 High Reasoning model.

When teams start building AI agents, most of the energy goes into the model, the prompts, and the tools. But there's a deciding factor that quietly shapes everything else: the surface.

A surface is where your agent lives and operates — the interface or environment through which a person interacts with it. The same agent logic can be deployed to a chat window, a phone call, an inbox, a code editor, or a headless API. Each is a different surface, and each one rewrites the rules for how the agent should behave.

The mistake I see most often is treating the surface as a thin presentation layer on top of an otherwise universal agent. It isn't. The surface sets your latency budget, your output format, your tool set, your failure modes, and how humans stay in control. Get it wrong and a perfectly capable agent feels broken.

What "surface" actually means

The framing. Think of the surface as the delivery channel. An agent answering in a Slack thread and the same agent answering over a phone line share a brain but almost nothing else.

One streams formatted text into a scrollable history; the other has to speak naturally and survive a 1.5-second silence without sounding like the call dropped.
Surface choice isn't a late-stage UI decision — it's an architectural one.
It should inform how you design context handling, confirmation flows, and error recovery from day one.

Chat interfaces

The challenge. Chat is the most familiar surface — Claude.ai, a Slack bot, a website widget. It's turn-based, text-driven, and usually renders markdown. Users expect it to feel instant and to remember everything.

UX considerations:

Stream tokens. A blank screen for more than a couple of seconds reads as failure.
Keep replies tight. Short conversational answers feel natural; walls of text feel like work. Use progressive disclosure — a concise answer with expandable detail.
Preserve continuity. Users assume the agent remembers the full thread; losing context mid-conversation is jarring.
Offer correction controls. Regenerate, edit, thumbs up/down.

Technical constraints:

Robust streaming infrastructure.
Managing a growing context window across long threads.
State persistence for conversation history.

Voice and phone

The challenge. Voice flips many chat assumptions on their head. Latency isn't just important — it's the entire experience, and formatting disappears completely.

UX considerations:

Treat latency as the product. First-token latency dominates; a pause longer than ~1.5 seconds feels like dead air.
Speak lists, don't print them. Render them conversationally: "there are three options," then sequence naturally.
Handle barge-in. Users interrupt mid-sentence; detect it and yield gracefully instead of talking over them.
Keep clarifying questions narrow. You can't show a menu, so prefer binary choices.

Technical constraints:

Streaming speech-to-text and text-to-speech.
Voice activity detection for interruption handling.
Tight latency budgets across the whole pipeline.
Real fallback design — endless "I didn't catch that" loops kill voice agents, so build in human transfer or SMS follow-up.

Email

The challenge. Email is asynchronous, which changes the whole contract. Users don't expect an instant reply — they expect a correct one.

UX considerations:

Prioritize accuracy over speed. There's time to get it right; use it.
Be thread-aware. Parse the full conversation, not just the latest message.
Match the register. Email skews more formal than chat; mirror the incoming tone.
State actions explicitly. If the agent books or escalates something, say so and offer an opt-out.

Technical constraints:

Parsing quoted reply chains and signatures.
HTML vs. plain text rendering and subject-line handling.
Staying clear of spam filters so messages actually arrive.

IDE and code editors

The challenge. Coding surfaces like Claude Code, Copilot, and Cursor split into inline completions (low-friction, low-context) and side-panel chat (richer back-and-forth). Both demand restraint and grounding.

UX considerations:

Preserve user intent. Don't rewrite more than you were asked; surgical edits beat full-file rewrites.
Show diffs. Per-change accept/reject controls are essential before anything lands.
Ground in context. Know the open file, cursor position, language, and ideally the wider project.

Technical constraints:

Inline completion must feel near-instant (under a few hundred milliseconds).
Integration with editor language servers, file system, and diff machinery.

Browser and web automation

The challenge. When an agent clicks, types, and navigates inside a browser, transparency becomes the central concern — and the environment is brittle.

UX considerations:

Make actions visible. Users need to see what's about to happen, especially before anything irreversible.
Gate high-stakes actions. Purchases, deletions, and form submissions should require explicit approval.
Show what the agent sees. Surfacing its "attention" on the page builds trust and eases debugging.

Technical constraints:

Resilience to changing pages, missing elements, and CAPTCHAs — with clear failure surfacing, not silent failure.
Managing session state (logins, cookies, auth) without exposing credentials.

Desktop and OS-level agents

The challenge. A desktop agent with access to files, apps, and the clipboard wields real power, and often runs in the background.

UX considerations:

Make permissions explicit. Users should know upfront what the agent can and can't touch.
Support aggressive undo. Or a sandbox-before-commit model, since actions can be destructive.
Decide the interruption model. When to notify versus when to block the user's flow.

Technical constraints:

Resource usage (CPU/memory) must be inspectable, not a black box.
Safe, scoped access to the file system and OS APIs.

Mobile

The challenge. Mobile is its own discipline, shaped by physical ergonomics, rich sensors, and constant interruption.

UX considerations:

Respect thumb reach. Confirm buttons and key controls belong in the lower screen.
Treat voice, camera, and photos as first-class. Design for them, don't bolt them on.
Design the return notification. For async work, the push that delivers the result is its own UX moment.

Technical constraints:

Long model calls drain battery and data — offload to the server, show progress, allow cancellation.
Agent state must survive the app being backgrounded.

API and headless

The challenge. With no UI at all, the developer is the user, and the API itself is your UX.

UX considerations (for developers):

Clarity of schema. Predictable, stable output structure.
Useful error messages. Tell the developer what went wrong and why.
Observability. Logging, tracing, and token-usage visibility.

Technical constraints:

Clean structured outputs (consistent JSON, no surprise markdown).
Clearly defined contracts for timeouts, rate limits, and partial failures.

Document editors

The challenge. Agents embedded in tools like Excel or PowerPoint should feel native, not bolted on.

UX considerations:

Write into the document. Output goes into cells or slides, not a separate pane.
Keep changes scoped. "Fix this formula," "rewrite this slide" — bounded, not whole-document rewrites.
Preview before applying. Like IDE diffs, let users see the change first.

Technical constraints:

Integration with the application's native undo stack.
Working within the file format's structure (cells, slides, styles).

Tool-using agents (a cross-surface concern)

The challenge. The moment your agent calls external tools — a search API, a database query, a booking service — you introduce latency and a transparency obligation that every surface has to account for. This isn't a separate surface so much as a layer that lands on top of whichever one you're on.

UX considerations:

Report tool use. Users should know when the agent went and did something — "Searching the web…", "Querying your calendar…". Silent tool calls erode trust and make latency feel like a hang.
Show progress during the delay. A tool call can take seconds; surface an intermediate state so the wait feels intentional, not broken.
Render custom tool output appropriately. Tools often return structured results — a map, a table, a chart, a card — and the surface determines how (or whether) you can show it. A chat UI can render a rich widget; a voice line has to summarize it aloud; a headless API just passes the structured payload through.
Surface tool failures honestly. When a tool errors or times out, say so and degrade gracefully rather than pretending the result didn't matter.

Technical constraints:

Latency budgeting. Tool calls add round-trips; parallelize independent calls and set timeouts so one slow tool doesn't stall the whole turn.
Surface-aware rendering of results. The same tool output may need a widget on web, a spoken summary on voice, and raw JSON on an API.
Streaming around tool calls. Stream the pre-tool reasoning, pause for the call, then resume — so the interface never goes silent.
Error and retry handling. Define behavior for partial results, timeouts, and malformed tool responses.

The principles that cut across everything

Step back and a few patterns hold everywhere, just expressed differently per surface.

Match confirmation friction to reversibility. Trivial, undoable actions shouldn't nag; irreversible ones always should.
Design the failure state, not just the happy path. Dropped calls, stale DOM, rate limits, timed-out tools — graceful degradation defines quality.
Calibrate trust through transparency. Give users enough visibility to form an accurate mental model — including when a tool ran — without drowning the interface.
Use progressive disclosure. Reveal detail on demand, whether that's an expandable section in chat or a "show me what you're about to do" step before a web action.

Closing thought

The surface isn't where you finish designing an agent — it's where you should start. Before tuning prompts or wiring up tools, ask what surface the agent lives on, and let its constraints shape the architecture underneath. An agent that respects its surface feels native and trustworthy. One that ignores it feels off, even when the underlying model is excellent.