AI Agents

Book a 30-min discovery call Discovery call · 30 min

What we mean by “AI agent”

An AI agent is a system that uses a language model to decide what tool to call next, in what order, until a task is done. The interesting word is “decide.” A workflow that always calls the same three APIs in the same order isn’t an agent; it’s automation, and we build that too. An agent is what you reach for when the next step depends on what came back from the last step.

For most of the SMBs and municipal IT teams we work with, the right first agent is small. One workflow, one team, a clear input and a clear output. The agents that get into trouble are the ones scoped as “AI assistant for the whole company.” Start narrow. Earn the right to expand.

When an agent is the right answer

Three signals we look for when scoping a first agent:

The decision is data-shaped, not vibe-shaped. Routing a ticket based on its content is data-shaped. Writing a press release tone-of-voice match is vibe-shaped. Agents do the first well, the second poorly.
A human currently does it 20+ times a week. Below that volume, an agent’s setup cost dominates. Above it, every week it’s deployed pays for itself.
The cost of being wrong is bounded. Agent triage routing a ticket to the wrong queue is a 30-second human correction. Agent autonomously sending a contract to a vendor is a different kind of incident.

If your candidate workflow doesn’t pass all three, we’ll tell you. The AI Readiness Assessment walks through a structured version of this same logic.

What we build with

We deploy from a toolbox sized to the engagement, not from a single framework we marry. Most engagements use one of three primary stacks:

Anthropic Claude Agent SDK for production agents that need long-context tool use and tight eval loops. This is our default for engineering-adjacent workflows.
OpenAI Agents SDK when the org is already standardized on the OpenAI API and the workflow benefits from native function calling patterns.
Microsoft Copilot Studio when the agent needs to live inside Teams, SharePoint, or Outlook, and the org runs M365 already. We’ve shipped Copilot Studio agents for municipal IT desks where the M365 tenancy was a hard requirement.

Custom Python or Node services come in when the agent needs to do something none of the above frameworks support cleanly. We avoid that path until we’ve ruled the frameworks out; custom code is real long-term maintenance cost.

What “shipped” means to us

A shipped agent has four things: an eval harness, a cost dashboard, a fallback path when the model fails, and a named human owner inside your org. If it doesn’t have all four, we don’t call it shipped. We call it a demo.

The eval harness is the one most teams skip. Without it, every prompt change is a prayer. With it, you ship updates with the same confidence you ship a code change that passes tests. See Anthropic’s guidance on evaluating Claude applications for the pattern we follow.

Engagement shapes

Pilot (4-6 weeks). One agent, one workflow, one metric. Fixed scope and fixed price, quoted in the proposal. End-of-pilot is a real “keep going / kill it” decision.
Embedded (3-6 months). Fractional AI engineer model. We’re in your Slack and your repos a few days a week, your team does the rest. Builds the second through fifth agents and the eval infrastructure that scales them.
Audit (1-2 weeks). Already running agents in production? We read the prompts, traces, and cost dashboards and deliver a prioritized fix list. Often this is enough; you take it from there.

Where to start

Take the AI Readiness Assessment if you’re not sure whether your org is ready to pilot. Read our AVA case study if you want to see what an agent shipped to production actually looks like. Or book a 30-minute discovery call and bring the workflow you’ve been arguing about internally; we’ll tell you whether it’s the right first one.

14:02:11 intake   ticket #4092: "VPN drops every morning"
14:02:11 classify category=vpn_outage  conf=0.94
14:02:12 retrieve KB:[A19, A22], tickets:[#3811, #3942]
14:02:12 draft    suggested fix + KB links
14:02:13 policy   after-hours=false  escalate=true
14:02:13 notify   on-call=mhassan paged
─────────────────────────────────────────────────────
end-to-end: 1.8s   tokens: 4,302   cost: $0.011

Run output from an AVA-deployed agent.

What's the difference between an AI agent and a workflow?

An agent decides what to do next based on what came back from the last step. A workflow always runs the same steps in the same order. The agent is what you reach for when the path depends on the data.

How long does a first agent take to ship?

Four to six weeks for the pilot. The discovery call sets scope; the first two weeks build the eval harness and the integration; the next two to four weeks are the agent itself, plus a week of bake-in.

What does it cost to run?

Per-call cost varies by model and context size. We ship a cost dashboard with every agent and target a per-task cost an order of magnitude below the labor cost of doing the same task by hand. We tell you the actual numbers in the discovery call.

What happens when the model is wrong?

Every agent we ship has a fallback path. For low-stakes tasks: log the error, escalate to a human queue. For high-stakes: refuse, escalate, and never act autonomously without a confidence threshold and a human review step.