AI Share of Voice Is a Vanity Metric. Track These Three Signals Instead.
Every brand that has gotten serious about AI visibility in the last six months now has a share of voice dashboard, and almost none of them can connect that number to a single dollar.
If the agent mentions you but never picks you, what did the mention actually buy?
Every brand that has gotten serious about AI visibility in the last six months now has a share of voice dashboard, and almost none of them can connect that number to a single dollar.
If the agent mentions you but never picks you, what did the mention actually buy?
TL;DR
AI share of voice tells you how often you appear in answers. It says nothing about whether anyone chose you.
The metrics that move revenue are query-level citations, what the citation does next, and whether an agent picks you when it can transact.
The tooling to measure real outcomes arrived this month, with Google shopping AI performance reports in Search Console and OpenAI moving toward in-answer commerce.
Build a measurement stack around being chosen instead of being counted.
Your review and loyalty data is one of the few signals that can prove the agent picked you and that the customer came back.
That critique landed the same week the tooling caught up, which is what makes this worth your attention right now instead of next quarter.
Why Share of Voice Stopped Being Signal
Share of voice was borrowed from the old media-buying playbook, where impressions were the closest thing you had to influence. In the answer engine (the ChatGPT, Perplexity, and Gemini surfaces where customers now ask for recommendations), a mention is cheap and a recommendation is everything. You can be named in a decent portion of category answers and still lose every time the model has to commit to one brand.
The deeper problem is that share of voice averages across queries that have nothing in common. Being mentioned in "what is a good moisturizer" is not the same as being chosen in "best moisturizer for sensitive skin under $40 that ships to Canada." One is TOFU (top of funnel) noise. The other is a BOFU (bottom of funnel) purchase decision the model is making on the shopper's behalf. A single blended percentage flattens that difference into a number that feels like progress and means almost nothing.
The real story is this: the answer engine is a buyer that picks one option per question, and your job is to be the one it picks on the questions that convert. It is not a billboard where exposure compounds.
What This Looks Like in the Wild
We've seen the gap between "mentioned" and "chosen" play out across the brands we work with, and it almost always shows up as three distinct patterns.
1. High Share of Voice, No Purchase Intent
A skincare brand was thrilled to see itself appearing in roughly a third of category answers on ChatGPT and Perplexity. When we mapped where, almost every mention was on broad educational prompts: ingredient explainers, routine breakdowns, "what does niacinamide do." On the prompts that ended in a recommendation with a price and a use case, a competitor with a fraction of their content owned the answer. They were loud everywhere it didn't matter and silent everywhere it did.
2. Cited, But the Citation Goes Nowhere
A home goods brand was getting cited as a source, which their dashboard counted as a win. The problem was what the citation did. The model pulled a spec from their product page, answered the shopper's question completely inside the chat, and never created a reason to click through or buy. The citation fed the answer and ended the journey. That is exposure that actively replaces a session instead of starting one.
3. Recommended, But the Agent Can't Transact
A supplements brand was consistently the model's top recommendation in its category, which looked perfect until you watched what happened next. When the agent could complete a purchase in-answer, it favored two competitors with cleaner structured product feeds and richer review data. Being the named favorite means little if the agent can't actually check out with you, or if a rival is easier to transact with at the moment of decision.
The Chosen-Over-Counted Audit
Here is the framework I'd run instead of staring at a share of voice line. I call it the Chosen-Over-Counted Audit, and it replaces one blended number with three signals that each map to a real moment in the purchase. The shape matters: each signal answers a different question, and you need all three to know where you actually stand.
Signal 1: Query-Level Citation, not blended mentions
Stop asking "how often do we appear." Ask "where do we appear, and does that query have buying intent." Pull your real category questions and sort them into TOFU, MOFU, and BOFU. Then check citation by query, not in aggregate.
Run the buying-intent versions yourself across ChatGPT, Perplexity, and Gemini:
"Best [category] for [specific use case] under [price]"
"Is [your brand] or [competitor] better for [specific need]?"
"What [category] should I buy for [specific situation]?"
A mention on "what is the best protein powder" is worth a fraction of a mention on "best protein powder for lactose intolerance that's third-party tested." Weight your scoring accordingly.
Signal 2: Citation Outcome, what the mention does next
A citation is only valuable if it does one of two things: drives a click to you, or contributes to an in-answer purchase from you. Track which of your citations produce referral traffic and which simply feed the answer and end the session. This is where the new tooling helps. Google now ships AI performance reports inside Search Console, so you can finally see how your content performs inside AI responses rather than guessing. Pair that with referral data segmented by source and you can separate citations that start a journey from citations that quietly end one.
Signal 3: Agent Pick Rate, whether you get chosen to transact
This is the metric that will matter most as agentic commerce matures, and it is the one almost nobody is tracking yet. When an agent can complete a purchase directly in the answer, how often does it choose you over the alternatives it's weighing? Run the transactional prompts, watch which brand the agent moves to checkout with, and log it. With OpenAI moving toward in-answer commerce, this stops being theoretical fast. The brand the agent picks to transact with is the brand that wins the sale, full stop.
Run It: Practical Checklist
Here is what to actually do in the next two weeks.
Build your weighted prompt set. Write 20 to 30 real category questions and tag each one TOFU, MOFU, or BOFU. Give BOFU prompts a 3x weight, MOFU 2x, TOFU 1x. This is the difference between measuring noise and measuring intent. Refresh the list quarterly as your category's language shifts.
Audit citation by query across three engines. Use Yotpo Discover to run every prompt in ChatGPT, Perplexity, and Gemini to log not just whether you appear, but in what role: passing mention, cited source, or named recommendation. The role tells you far more than the rate.
Track agent pick rate on transactional prompts. For every BOFU prompt where the engine can transact, record which brand the agent chooses to check out with. Watch what the winners have that you don't: cleaner structured product feeds, richer review coverage, clearer pricing and shipping data. Those are the inputs the agent reads when it commits.
Prove the agent picked you, and that the customer stayed. This is where your owned data earns its keep. Review depth and recency feed the model's confidence in recommending you, and loyalty data tells you whether an agent-driven first purchase turned into a second. Review and loyalty signals are one input here, alongside your own post-purchase surveys asking new customers how they found you. If "ChatGPT recommended you" starts showing up in those answers, you are measuring the right thing.
The brands that win the next two years won't be the ones with the highest mention count. They'll be the ones who figured out, query by query, where the answer engine actually hands over a customer, and then made themselves the obvious choice on exactly those questions.
Share of voice tells you that you're in the room. It doesn't tell you that anyone listened, clicked, or bought. Those are three different things, and only the last one pays.
Stop counting how often you're mentioned. Start measuring how often you're discovered, trusted, and chosen.