Last week, I was on a call with a sales leader at a fast-growing data company who had given his AEs access to Claude. He wanted his team to move faster, work smarter and use AI to research accounts before their calls. Pretty standard stuff.
But what he didn’t anticipate was what they would end up doing with it. Unfortunately, his reps began to use expensive AI models, and the web lookup tools those models call, to pull job listings, news mentions and company events.
Essentially, they were driving a Ferrari to fetch groceries.
Yes, the results were fine, but the bill was not.
And get this: It wasn’t just one rep running one query. It was ten reps running the same exact queries independently, and none of them knew what the others already had. So the cost wasn’t linear, it was exponential.
In just a few minutes chatting with him, I realized this sales leader was not describing an AI problem, but rather a retrieval problem wearing an AI costume.
The thing nobody tells you about scaling AI agents
Everyone’s talking about what AI agents can do, but nobody is talking about what they cost to run at scale.
Moving from demo to production surfaces two problems most teams aren’t ready for: the cost of running agents on the wrong infrastructure, and the governance required to let them act safely.
Because when you give an AI agent access to a powerful model and a set of tools like web search, CRM connectors and enrichment APIs, it will use those tools for everything. Including things that don’t need a model at all. Including fetching information that already exists somewhere in your stack, structured and ready to go.
This is how organizations end up burning expensive frontier model tokens to retrieve data they already own, and, naturally, this gets more expensive the more you scale.
MCP changed the game, but it also created a new problem
The rise of MCP is genuinely important. MCP, or Model Context Protocol, is a standard that lets AI agents connect to external tools and systems. In practice, that means an agent can query your CRM, pull enrichment data, search the web, and update records all from a single prompt.
But here's the distinction nobody is making: MCP carries overhead. Every retrieval routes through the model, which has to figure out what to ask and how to ask it.
A well-designed CLI, paired with a good prompt, gives the model enough context to make the right queries without that overhead. And when you package complex CLI calls into pre-written scripts, the logic moves out of the model entirely. The reasoning happens in the script. The model just acts on the result. That's where the cost savings actually live.
Most organizations end up using both: MCP for conversational AI assistants, CLI for deterministic automation and production pipelines. They’re not competing approaches, so understanding the difference is what separates teams that scale AI cost-efficiently from teams that don’t.
And here's what teams are also discovering in production: connecting your AI to twenty systems doesn't make it smarter. It makes it more expensive to be wrong. When an agent can query Salesforce, your data warehouse, your enrichment provider, and six other sources simultaneously, you haven't solved your data problem. You've distributed it across every inference call.
Which record wins when sources disagree? Which contact is current? Which signal is real? The model answers those questions live, every time, at full cost with no memory of what it concluded last time.
More connections. More tokens. More reconciliation work being done at exactly the wrong layer of the stack.
The CLI insight nobody is acting on
Most of what AI agents spend tokens on doesn't need a model, but rather a query.
Here’s the part that gets missed: the cost isn’t really in fetching the data, but rather in the processing of that data. Because when an agent uses a web tool to find recent company news, it doesn’t retrieve four clean bullet points, it pulls hundreds of pages, processes all of them, and then extracts those four specific signals that were actually helpful. As such, the model is doing very expensive work that should have been done further upstream.
And that’s what a purpose-built tool does: it processes the firehose in the background and hands the model only what it needs—data that is pre-enriched, deduplicated, context-aware, and ready to act on, so that the model never sees the noise.
If the data is already clean, reconciled, and accessible, you retrieve it directly via CLI without spending a single token. The model never sees raw exports or conflicting records. It receives clean, resolved context and uses its intelligence for the part that actually requires intelligence.
The cheapest token is the one you never had to spend.
Most AI architectures aren’t designed this way, and we think they should be. That’s why it’s shaped how we’ve approached the solution to this problem at Common Room.
And then there's the question of what agents are allowed to do

Solving the cost problem gets you to production and the governance problem is what keeps you there. Because even after you solve the retrieval problem, inevitably, production AI will end up handing you another one.
For example, let’s say an agent identifies a perfect prospect. Should it launch an outbound sequence? Maybe. But that contact might be on a do-not-contact list. The account might belong to a rep in a different region.
Your company might require VP approval before anyone touches C-suite contacts at enterprise accounts. And an agent might have permission to read a CRM record. That doesn't mean it has permission to update one.
The agent sees the data. But data isn't context. It doesn't know the history, the relationships, the internal rules of engagement that every experienced rep carries in their head. And even when it gets the action right, it has no way of knowing whether it's authorized to take it.
This is where most AI expansion programs quietly stall. Not because the models failed. Because nobody built the layer that translates how your business actually operates the roles, the rules, the permissions, the exceptions into something an agent can respect.
A rep knows not to cold-call a C-suite contact at an account that's mid-renewal. An agent, given access to that record, has no idea. So giving agents unlimited access doesn't create capability, it creates exposure.
What your stack is actually missing
The industry talks about models, agents, and integrations.
Production AI needs two things underneath all of that.
A data layer that resolves identity, reconciles signals, and assembles clean context before it ever reaches a model. More connections was never the answer. A better foundation is.
And a governance layer that encodes how your business actually operates: roles, permissions, read vs. write access, approval workflows, do-not-contact rules, rules of engagement. The difference between what an agent can do and what it's allowed to do.
Without both, AI scales into cost and complexity. With both, it scales into something you can actually trust. And that’s why we think about the stack the way that we do.
The race that's already shifting
Every company will have AI agents. Every company will have access to the same powerful models. That's not the race worth competing in.
Instead, the organizations that pull ahead will have better infrastructure underneath those agents. Systems where AI operates on trusted context rather than expensive guesswork.
Your reps probably shouldn't be using frontier AI to pull job listings. Or news mentions. Or company events. Or any signal that a purpose-built tool already has, clean and ready to serve.
The companies that win won't have better models. They'll have better governance, lower costs, more scale, and data their AI can actually trust. The model is a commodity. What you feed it and what you let it do is the moat.
