We built AVA for our own MSP · AIX Automation Lab

Why we built it

We run an MSP. The MSP runs on Autotask PSA: every ticket, every time entry, every client touch lives there. The engineering team’s daily question wasn’t “what’s the AI strategy”; it was “have we seen this exact problem before, and what did we end up doing about it?”

The honest answer was: yes, we’ve seen it, three times, and the resolution is buried in a ticket from a year ago that nobody can find. Every engineer was paying a tax to re-discover what the team already knew. Clients were paying that tax too, in the form of a second engineer asking the same intake questions a different engineer asked last month.

AVA (Aix Virtual Assistant) is the product that pays that tax for us, on both sides of the chat surface.

What it does

AVA sits in two places at once. On the client side it’s a web chat the client opens from their portal; on our side it’s the engineer-facing console wrapped around the same conversation. Three things, in this order:

Trend analysis on the user’s history. When a client opens a session, AVA surfaces that end-user’s recent tickets, the categories they’ve trended toward, and the resolutions that worked. Repeat-offender patterns become visible inside the first 30 seconds of triage, before the engineer types a word.
Similar-ticket retrieval. AVA searches the full ticket corpus, including closed tickets, resolution notes, and related KB articles, and ranks the top matches by semantic similarity, weighted by recency and resolution success. Every retrieval cites its source ticket. No source, no answer; the engineer’s first instinct is to trust the suggestion enough to click through.
Autotask actions from the chat surface. When an engineer wraps a call, AVA drafts a ticket note in our standard format and posts it to Autotask with one click. Time entries, status updates, and assignment changes happen from the chat itself. The “I’ll do my time entries at end of day” anti-pattern stops here.

The architecture

We made four architectural choices that other MSPs we’ve talked to keep getting wrong. Here’s the shape, sanitized.

Retrieval before generation. AVA is a RAG system at heart, not a freestanding agent. Every time the model speaks, it’s speaking from retrieved context: the user’s ticket history, similar tickets, KB articles, and live Autotask state for the active ticket. We don’t ask the model to remember facts about our environment. We ask it to reason over facts we hand it.

Embeddings on ticket bodies, not titles. Ticket titles are too short and too inconsistent to embed usefully. We chunk the ticket body (description plus resolution notes plus relevant comments) at semantic boundaries (paragraph breaks, list breaks, code block edges) and embed each chunk independently. Retrieval ranks chunks, then aggregates back to the parent ticket. This was the single biggest precision lift in the build.

A primary model with a tested fallback. We use a frontier reasoning model from Anthropic as our default, with an OpenAI fallback path that kicks in on rate-limit or provider-side failures. The fallback is exercised in CI on every change. See Anthropic’s evaluation guidance and OpenAI’s evals cookbook for the patterns we follow.

Tool use is narrow on purpose. AVA has a small, audited set of tools: search the ticket corpus, fetch a specific ticket, fetch a user’s history, draft a ticket note, post a ticket note, log a time entry. That’s it. Every tool we considered adding got the same gating question: does this tool make a decision the engineer would otherwise make, and is the cost of being wrong bounded? Most candidates didn’t pass.

Want the full architecture walkthrough, the model selection rationale, and the eval setup? Book a 30-minute discovery call and we’ll walk it through live with you, or send a note from the AVA product page.

What didn’t (and what we did about it)

We shipped two versions of the retrieval pipeline before we got the chunking right. The first version embedded entire ticket bodies; retrieval ranked highly on token count, not relevance. The second version chunked at fixed 500-token boundaries; retrieval ranked highly on whichever chunk happened to contain the rare keyword. The third version chunked at semantic boundaries and that was the version that shipped.

The eval harness is what made the difference. We had a test set of real engineer questions with known good answers, and every change ran against it before merging. Without that harness, we’d have shipped version one and convinced ourselves it was working.

The other failure worth flagging: we underestimated how much of AVA’s value lives in the client-facing surface, not the engineer-facing one. Our first design treated the client chat as a thin form that fed the engineer console. Clients ignored it. We rebuilt it as a real conversation surface, with AVA drafting context-aware responses the client could accept, edit, or escalate, and adoption went from grudging to default. We now warn clients in scoping calls: the customer surface deserves the same craft as the operator surface.

Three lessons we’d carry to your build

1. The eval harness is the deliverable. Before AVA was a product, it was a folder of test cases on disk. Every prompt edit, every model swap, every chunking change ran against that test set before it shipped. Most MSPs we’ve talked to about agent projects skip this step because it doesn’t demo. Skipping it is also why their agents work in week three and break in week eight.

2. Tool minimalism beats tool sprawl. We rejected more tools than we shipped. Every tool you add to an agent is a surface area for the model to misuse, a permission boundary to police, and a failure mode in production. A small set of well-chosen tools beats a sprawl of mediocre ones, every time. The Anthropic engineering notes on building effective agents make this point better than we will.

3. Ship to yourselves first. We ran AVA on our own MSP for months before we considered it shippable to other MSPs. We caught the broken retrieval, the cost spikes, the awkward tool calls, and the rough client-facing flows on our own time, not a client’s. That’s the operator DNA pitch, and it’s why we run the Pilot Program the way we do. We want the same dogfooding window with our first three external clients, with their workflows, on their data.

What this means for you

If you run an MSP on Autotask, AVA is licensable. The integration plumbing took months; you can skip it. Talk to us about MSP licensing.

If you run an SMB or a public-sector IT team, the more interesting question is what your version of AVA looks like. The architecture decisions above generalize. The use case rarely does; we’d start the conversation by understanding which 30 questions your team and your customers ask most often, and whether your data is clean enough to retrieve over.

The fastest read on whether your org is ready: take the AI Readiness Assessment. The use-case-clarity dimension is the one this kind of project lives or dies on.

Or book a 30-minute discovery call and bring the workflow you’ve been arguing about internally. We’ll tell you whether it’s the right shape for an AI agent, workflow automation, or a RAG system, or whether it’s the kind of project that needs runbook work first.

— The AIX Automation Lab team

Book a 30-min discovery call 30 minutes · no slides