AI Recommendations

How AI Actually Decides Which Products to Recommend

ZIO Team•January 31, 2025•10 min read

AI search has changed how people find products. Instead of ranking websites for users to browse, systems like ChatGPT, Perplexity, and Google Gemini return a synthesised answer with specific brand recommendations baked in. Brands that appear in those answers capture high-intent traffic. Brands that don't are invisible. The scale and implications of that shift are worth understanding on their own terms, but this article addresses a different question: what actually determines which brands the AI recommends?

To optimise for AI discovery, you need to understand how AI recommendations are actually generated, at the level of the mechanisms that determine whether your brand appears in a given answer. Most marketers carry a mental model into GEO that treats AI search as a conversational version of Google. That model systematically underestimates the problem.

The Three Layers of an AI Recommendation

Modern AI discovery systems operate through three overlapping mechanisms: pre-trained knowledge, live web retrieval, and context-conditioned reasoning.

A fourth factor is emerging alongside these three. 'Thinking' models (OpenAI's o1 and o3, Google's Gemini in deep research mode) engage in explicit multi-step reasoning before producing an output. For complex product comparisons, they reason through trade-offs more deliberately, which can shift which brands get selected.

Layer 1: Pre-trained Knowledge and the Corpus Effect

The foundation of every AI recommendation is its training data. Models are trained on vast corpora of web content: Wikipedia, news archives, academic papers, product reviews, forum discussions, and billions of other documents. This training process encodes statistical patterns into the model's weights, associations between entities, attributes, and concepts that persist long after the training run completes.

We call this the Corpus Effect. Brands that appeared frequently, in high-quality contexts, across diverse authoritative sources during training carry a statistical advantage. When tokens like 'sustainable,' 'ethical,' and 'outdoor clothing' appear near 'Patagonia' thousands of times across the training corpus, those associations get baked into the model's weights. Ask it about sustainable outdoor clothing and Patagonia will dominate the output, pulled forward by sheer weight of association in the training data.

For newer brands, this is a cold start problem. If the model lacks strong associations for your brand in its training data, you are starting from a disadvantage that content alone cannot close.

Layer 2: Live Web Search and Retrieval

Static training data has an obvious limitation: it goes out of date. Modern LLMs, including those powering Perplexity, Google AI Overviews, and Microsoft Copilot, address this through real-time web search. When a user submits a query, the system searches the web, retrieves relevant pages, and uses that fresh content to ground the response in current information. The research community calls this Retrieval-Augmented Generation (RAG). In practice, users experience it as the AI looking things up.

The citations that appear in AI answers reflect these retrieval choices. But citations do not show the full reasoning chain; they show which external sources the system retrieved and used to support its output. The model's pre-trained associations are doing significant work before retrieval even begins. Retrieval adds recency and specificity, but does not override the foundational layer.

For brands, this means two distinct optimisation targets: being well-represented in training data (a long-run, ecosystem-level effort) and being consistently retrievable from high-authority sources when relevant queries are processed.

Layer 3: Entity Association and Consistency

LLMs treat brands as conceptual entities with associated attributes: categories they belong to, qualities attributed to them, use cases they are associated with, and comparisons they appear in. A brand's visibility depends on how consistently high-authority sources reinforce the same associations.

The concept of a corroboration threshold is important here. A model will produce consistent recommendations about a brand when enough independent, trusted sources make the same claims in compatible terms. A brand whose positioning is inconsistent across sources, described as premium on its own site but as budget in reviews, will generate conflicting signals that reduce the model's confidence in any single characterisation.

This has a direct implication for brand strategy: the breadth and specificity of a brand's 'surface area' in external sources matters enormously. A brand that is clearly and consistently associated with a narrow, specific use case will be recommended reliably for that use case. A brand that tries to be all things to all personas may achieve weaker associations across all of them.

Why AI Recommendations Vary

AI text generation involves probability and sampling. Given an input, the model does not produce a deterministic output. It calculates a probability distribution over possible responses and samples from it. This introduces variability: the same question can produce different brand recommendations across different runs.

Some brands sit at high probability for a given query context and appear in almost every response. Others hover at the margin, present in some runs, absent from the next. The goal of AI visibility strategy is to shift your brand's probability of appearing for relevant contexts, which requires understanding what signals drive that probability, not just monitoring whether you appeared in any given run.

For measurement, this means any single prompt result is a sample from a distribution, not a reliable fact. For optimisation, it means the goal is to shift the underlying probability over time by improving the signals the model draws on.

What This Means for Strategy

Visibility in AI answers is earned through the totality of your digital presence across authoritative external sources, not through any single piece of content. Third-party coverage carries more weight than owned content because it is more likely to form part of the training data and retrieval pool. Consistency of positioning across those sources builds the corroboration that allows models to recommend with confidence.

Brand surface area matters too: being clearly associated with defined use cases and personas produces stronger visibility than a broad, undifferentiated presence. And because AI outputs are probabilistic, optimisation must target distributions, measured across many runs and contexts, not spot-checked against individual results.

But there is a deeper problem with treating AI visibility as a single number. The AI does not give the same answer to every user. Different personas receive different recommendations, and that changes everything about how you approach visibility.

Written by

ZIO Team

Research Team

The ZIO research and product team, dedicated to advancing persona intelligence.