How AI Actually Decides Which Products to Recommend
To optimise for AI discovery, you need to understand how AI recommendations are actually generated. Not at the level of marketing metaphor – 'the AI learns from the web' – but at the level of the mechanisms that determine whether your brand appears in a given answer. This is more complex than it first appears, and getting it wrong leads to wasted effort. The mental model most marketers carry into GEO – that AI search works like a more conversational version of Google – systematically underestimates the problem.
The Three Layers of an AI Recommendation
Modern AI discovery systems operate through three overlapping mechanisms: pre-trained knowledge, live web retrieval, and context-conditioned reasoning. Understanding each layer is foundational to effective AI visibility strategy.
It is also worth noting the role of 'thinking' or extended reasoning models – AI systems like OpenAI's o1 and o3, or Google's Gemini with deep research mode – that engage in explicit multi-step reasoning before producing an output. For complex product comparisons, these models may reason through trade-offs more explicitly, which can affect which brands are selected.
Layer 1: Pre-trained Knowledge and the Corpus Effect
The foundation of every AI recommendation is its training data. Models are trained on vast corpora of web content – Wikipedia, news archives, academic papers, product reviews, forum discussions, and billions of other documents. This training process encodes statistical patterns into the model's weights: associations between entities, attributes, and concepts that persist long after the training run completes.
The result is what might be called the Corpus Effect. Brands that appeared frequently, in high-quality contexts, across diverse authoritative sources during the training period carry a statistical advantage. When tokens like 'sustainable,' 'ethical,' and 'outdoor clothing' appear in close proximity to 'Patagonia' thousands of times across the training corpus, those associations become embedded in the model's weights. Asking the model about sustainable outdoor clothing will predictably surface Patagonia – not as a programmed preference, but as a result of statistical patterns encoded through training.
The implication for newer or less-established brands is stark: if the model does not have robust associations for your brand in its training data, it is starting from a disadvantage that content optimisation alone cannot fully overcome.
Layer 2: Live Web Search and Retrieval
Static training data has an obvious limitation: it goes out of date. Modern LLMs – including those powering Perplexity, Google AI Overviews, and Microsoft Copilot – address this through real-time web search. When a user submits a query, the system searches the web, retrieves relevant pages or passages, and uses that fresh content to ground the response in current information. This is what the research community calls Retrieval-Augmented Generation, or RAG – though in practice, most users simply experience it as the AI 'looking things up.'
The citations that appear in AI answers reflect these retrieval choices. But citations do not show the full reasoning chain; they show which external sources the system retrieved and used to support its output. The model's pre-trained associations are doing significant work before retrieval even begins – retrieval adds recency and specificity, but does not override the foundational layer.
For brands, this means two distinct optimisation targets: being well-represented in training data (a long-run, ecosystem-level effort) and being consistently retrievable from high-authority sources when relevant queries are processed (a more tractable, content-and-PR-level effort).
Layer 3: Entity Association and Consistency
LLMs treat brands as conceptual entities with associated attributes – categories they belong to, qualities attributed to them, use cases they are associated with, and comparisons they appear in. A brand's visibility depends on how consistently high-authority sources reinforce the same associations.
The concept of a corroboration threshold is important here. A model will produce consistent recommendations about a brand when enough independent, trusted sources make the same claims in compatible terms. A brand whose positioning is inconsistent across sources – described as premium on its own site, as budget in reviews – will generate conflicting signals that reduce the model's confidence in any single characterisation.
This has a direct implication for brand strategy: the breadth and specificity of a brand's 'surface area' in external sources matters enormously. A brand that is clearly and consistently associated with a narrow, specific use case will be recommended reliably for that use case. A brand that tries to be all things to all personas may achieve weaker associations across all of them.
Why AI Recommendations Vary
AI text generation involves probability and sampling. Given an input, the model does not produce a deterministic output – it calculates a probability distribution over possible responses and samples from it. This introduces variability: the same question can produce different brand recommendations across different runs.
Think of it this way: some brands sit at high probability for a given query context and appear in almost every response. Others sit at lower probability and appear in some responses but not others. The goal of AI visibility strategy is to shift your brand's probability of appearing for relevant contexts – which requires understanding what signals drive that probability, not just monitoring whether you appeared in any given run.
For measurement, this means any single prompt result is a sample from a distribution, not a reliable fact. For optimisation, it means the goal is to shift the underlying probability over time by improving the signals the model draws on.
What This Means for Strategy
The architecture of AI recommendations has clear strategic implications:
- Visibility in AI answers is earned through the totality of your digital presence across authoritative external sources – not through any single piece of content or technical optimisation.
- Third-party, authoritative coverage carries more weight than owned content, because it is more likely to form part of the training data and retrieval pool the AI draws on.
- Consistency of positioning across sources builds the corroboration that allows models to recommend with confidence.
- Brand surface area matters: being clearly and specifically associated with defined use cases and personas is likely to produce stronger visibility than a broad, undifferentiated presence.
- Optimisation must target probability distributions, not fixed positions – which requires measuring across many runs and contexts, not spot-checking individual results.
Understanding these mechanisms is the foundation. The next question is even more important: the AI does not give the same answer to every user. Different personas receive different recommendations – and that changes everything about how you approach AI visibility.
Written by
ZIO Team
Research Team
The ZIO research and product team, dedicated to advancing persona intelligence.