Key takeaways
- AI-on-your-data demos look impressive but the math is right while the meaning is invented.The LLM has to guess what 'revenue,' 'conversion,' or 'best customer' means in your specific business, and the guess defaults to industry convention.
- An AI agent on day one is a smart new hire on day one.Same context problem, none of the absorption mechanisms. It cannot sit in meetings, ask the right person in the kitchen, or read Slack. It either has the context or it assumes, in clean output, with no flag.
- The failure mode is worse than 'the model cannot do this.'The model absolutely can. It produces confident, plausible output with no surface indicator that a semantic assumption was made. You cannot tell from looking that the formula is wrong.
- The fix is a semantic layer.Every metric and dimension carries two things: the formula a data engine can execute, and the plain-language description of what the metric means in this business. The agent stops guessing and starts answering.
- Per business, not per industry.Two apparel brands can define 'conversion,' 'active customer,' or 'contribution margin' completely differently. The semantic layer encodes each business's definitions in that business's language.
- The order matters: define meaning first, then plug in AI.The brands getting real value from AI in analytics are not the ones with the best LLM. They are the ones whose data had meaning before the agents ever touched it.
Ecommerce brands are racing to wire AI into their data. LinkedIn is full of clips of someone pointing Claude at a BigQuery warehouse, plugging GPT into GA4, or uploading a dashboard screenshot and asking an LLM to “analyze this.” The clips look impressive. The workflows don’t produce real insight.
The piece that’s missing isn’t the LLM. It’s meaning.
The pattern everyone is celebrating
The setups all rhyme:
- Native BigQuery MCP, where the agent gets read access to raw tables and you ask it questions in plain English.
- GA4 MCP, where the agent pulls traffic and conversion data straight from the source.
- The screenshot route, where someone hands a dashboard image to a model and asks “what’s going on here.”
- The “I connected my warehouse” Slack post, where the proof of concept is the connection itself.
All of these produce output. Polished output. Confident output. Output that gets posted as a thirty-second proof that AI changes everything for marketing analytics.
Almost none of it is right in the way that matters.
An AI agent on day one
Imagine a smart new hire who joins your company on Monday. You hand them admin access to BigQuery and a Looker license and say: figure out where we’re losing money.
By Friday they produce a report. It looks polished. It cites real numbers. It is also confidently wrong in places that actually matter, because they don’t know:
- Whether “revenue” in your tables means gross, net of returns, or net of discounts
- What counts as a “conversion” in your business. Is a subscription start the same as a one-time purchase? What about free trials? Reactivations?
- How contribution margin is calculated for your category mix. Are shipping subsidies included? Platform fees? Marketing attribution?
- Which channels are operating at break-even by design (top of funnel) versus which are supposed to be profitable
- What seasonality looks like, so they don’t flag the November spike as an anomaly
A good analyst takes weeks to absorb this context. They ask questions. They read internal docs. They sit in meetings and overhear things.
An AI agent has the same problem on day one. It also has none of the mechanisms a human uses to absorb context. It can’t sit in a meeting. It can’t ask the right person who happens to be in the kitchen. It either has the context, or it doesn’t, and if it doesn’t, it assumes. Confidently. In clean-looking output. With no flag that the assumption was made.
Why a smart LLM doesn’t infer its way out
The honest pushback: “But Claude is smart. Can’t it just figure out what these fields mean from context?”
Partially. An LLM can guess, and the guess will be plausible. The problem is that the guess defaults to industry convention, which may or may not match your business.
Three quick examples of what this looks like in practice:
- You ask the agent to calculate “profit.” It defaults to revenue minus cost of goods. Your business defines profit as contribution margin net of attributed marketing spend. The number the agent produces is off by a meaningful percentage and there is nothing in the output to tell you.
- You ask the agent to identify “your best customers.” It uses lifetime revenue. Your team defines best customers by repeat purchase rate inside a 90-day window. The list it returns is plausible, defensible, and not what you would have produced.
- You ask the agent for “conversion rate.” It picks a denominator (sessions? unique visitors? add-to-carts?) and runs with it. The number is plausible. It is not the number your team uses internally.
In all three cases, the agent is not failing at math. It is guessing semantic context. The result is a class of error that is worse than “the model can’t do this.” The model can absolutely do this. It does it confidently, with clean output, and you cannot tell from looking that an assumption was made.
This is the failure mode of every AI-on-your-data demo currently going viral. The model works. The math works. The meaning is invented.
What a semantic layer actually is
The piece that closes the gap is a semantic layer. The concept exists in business intelligence tools today, but those layers were built for humans clicking through dashboards. LLMs need the same thing, arguably more, because they can’t ask Slack what something means when they’re not sure.
A semantic layer attaches two things to every metric and every dimension in your data:
- The formula. The exact calculation, in terms a data engine can execute. Not a description. Not a guideline. The literal logic that defines what this metric is.
- The semantic description. A plain-language explanation of what the metric represents in this business, which edge cases are included or excluded, how it relates to other metrics, and how to interpret it.
When an AI agent queries data through a semantic layer, it does not see raw tables and have to guess. It sees defined building blocks. Each block has its formula attached, and each block has its meaning attached.
A worked example. For one ecommerce brand, contribution margin might be defined like this:
Metric: Contribution Margin
Formula:
net_revenue
- cogs
- shipping_cost
- return_cost
- attributed_marketing_spend
Description:
Contribution margin per order, after all direct variable
costs are subtracted from net revenue. Marketing spend is
attributed to the order using a 7-day click + 1-day view
window on a data-driven model. Returns within 30 days are
netted against the original order. Excludes fixed overhead
and platform fees.
When the agent is asked “which product line has the strongest contribution margin”, it does not invent a formula. It uses this one. The answer matches what your team would calculate by hand. The agent and the spreadsheet agree.
This is the layer the LinkedIn demos are missing.
Per business, not just per industry
Standardization is half of the value. Customization is the other half, and it’s the part that actually moves the needle.
Two ecommerce brands selling apparel can have completely different working definitions of:
- What counts as a conversion (first purchase only? gift cards? subscription starts? reactivations?)
- How contribution margin is calculated (with or without marketplace fees? with or without paid acquisition?)
- What “active customer” means (purchased in the last 90 days? 180? subscriber status?)
- Which channels are “performance” and which are “brand,” and where the dividing line is
A real semantic layer encodes all of this per business. Default metrics ship as defaults, but anything that’s unique to a company gets defined in that company’s language. The same agent, querying two different brands’ data, gets two different sets of definitions and produces two different answers. Both are correct, because both reflect the actual business.
Without the layer, you get the homogenized industry-standard answer regardless of how your business actually works. Often close enough to look right. Almost never close enough to be useful.
What changes when the layer is there
The agent stops being a new hire on day one. It becomes something closer to an analyst who has been at the company for two years. It knows the definitions. It knows the edge cases. It knows what’s normal for this business. When you ask a question, it answers in your business’s language, not the LLM’s default training data.
The practical changes are immediate:
- You can ask deep, layered questions and get accurate answers. “Show me contribution margin by acquisition channel by cohort, excluding subscription customers who started on a free trial.” Without a semantic layer, that’s a half-hour manual SQL session and a follow-up Slack message to verify the formula. With one, it’s a sentence.
- Hallucination collapses. Not because the model got smarter, but because there is nothing left to hallucinate about. The metric is defined. The definition is loaded. The answer is grounded.
- The agent answers consistently across sessions, across team members, across queries. Same definition, same answer. No more “Claude said X yesterday, GPT said Y today, the dashboard says Z.”
- A junior team member gets senior-level analytical leverage on day one. Their agent has the context they don’t have yet.
This is what real leverage from AI in marketing analytics actually looks like. Not a screenshot demo. Not a viral clip. The agent produces analysis at the speed of thought, with the accuracy your most senior analyst would produce by hand.
What we built at Adtribute
This is the layer Adtribute runs on. Every metric and every dimension in the platform has a formula and a semantic description attached. The MCP server we recently released exposes those definitions to AI agents. When Claude or any other LLM queries data through Adtribute, it isn’t staring at raw warehouse columns trying to guess what they mean. It sees defined building blocks, each with their calculation logic and their meaning.
Default metrics ship pre-defined. Business-specific metrics and dimensions get described per customer, in the language the business actually uses. A brand that defines “active customer” differently than the industry default gets that definition encoded once, and from that point on every agent query respects it.
The agent stops guessing. It starts answering.
The order matters
The default pattern in 2026 is: plug in AI first, figure out why the answers feel off later. The pattern that produces real leverage is the reverse: define meaning first, then plug in AI.
It is not a glamorous insight. There is no thirty-second demo for “we spent three weeks defining our metrics before we connected an agent.” But it is the difference between AI as theater and AI as leverage.
The companies getting real value from AI in their analytics are not the ones with the best LLM. They are the ones whose data has meaning before the agents ever touch it.
Build the semantic layer first. The agent is the easy part.