Which sources AI uses to form an opinion about a brand — and why the site is not the only hero

Research question

Which layers, exactly, shape machine opinion about a brand, and why the company’s own website remains important but ceases to be the sole arbiter.

Evidence type

Documents from Google, OpenAI, Microsoft, and Perplexity, as well as survey research on retrieval-augmented generation and on the integration of external knowledge.

Freshness of factual claims

The factual material on search mechanics and answer systems is current as of March 2026.

The website as the primary source, but not the only arbiter

In late 2025, a B2B platform undertook a serious website overhaul: it updated product descriptions, rewrote case studies, and restructured its pages around client tasks. A month after the relaunch, the company's marketer asked ChatGPT: "Which platforms are suitable for automating procurement in a mid-size manufacturing business?" The answer was unexpected. The model did name the brand — but described it in the words of a two-year-old review on an industry portal. It took the category from a G2 profile. The price range came from a Reddit discussion. The new website — the one that had taken months of work — might as well not have existed in the machine's answer. This is not a bug in any particular system. It is how the environment works: AI assembles its opinion of a company from several layers at once, and the website is an important voice among them, but far from the only one.

This article examines the mechanism itself: which layers, exactly, make up machine opinion, and how they relate to one another. (The question of which external sources give a brand the right to be recommended — as opposed to merely mentioned — is explored in more detail in a separate article on external authority.) The architecture looks like this. Google Search Central states explicitly that AI Overviews and AI Mode use a fan-out decomposition of the query across subtopics and data sources, then surface a broader and more diverse set of supporting links than classic web search [1]. In its AI Mode help documentation, Google adds that the system breaks the question into subtopics and simultaneously looks for relevant material for each of them [2]. OpenAI describes ChatGPT Search as a mechanism for producing fast, up-to-date answers grounded in web sources and informed by the context of the entire conversation [3][4]. Perplexity expresses the same idea with maximum directness: the system searches the internet in real time, gathers information from trustworthy sources, and condenses it into a short explanation [5].

If we translate that technical picture into the language of the brand, the conclusion is simple but important. AI’s opinion about a company is built from at least five layers.

Five layers of the source contour

The first layer is the brand’s owned channels. These include the website, documentation, FAQ sections, product descriptions, pricing pages, case studies, public research, the press center, expert blogs, and, in some cases, video transcripts and technical knowledge bases. This layer defines the base thesaurus: what the brand calls itself, which category it places itself in, and which properties it puts in the foreground. If there is already confusion across the brand’s own channels, no amount of external reputation will save it. The machine needs a starting scaffold.

The second layer is search and link context. Even when the answer shown to the user looks like a conversation, the logic of search infrastructure is often still operating underneath it. Google reminds us that, to participate in AI features, pages must be indexed and broadly suitable for ordinary search [1]. Put simply, the AI intermediary rarely starts from zero: it relies on the preexisting layer of discovery, indexing, and selection of web documents. That is why the site’s technical accessibility, the quality of its text, and basic search discipline still matter. But they no longer guarantee dominance. They merely get the brand into the game.

The third layer is external editorial and industry sources. These include reviews, comparisons, rankings, interviews, analytical materials, trade-media publications, directories, and business profiles. This is usually where the brand gets what it cannot give itself: external validation. If the official website claims that the company is strong in complex enterprise analytics, while independent sources describe it as a niche tool for small business, the answer system has to reconcile those versions. And very often it chooses the version that is better validated and more clearly embedded in the network of links. In answer systems, self-presentation without external verification carries less weight than brands would like.

The fourth layer is the user trace. This includes reviews, forum discussions, questions and answers in communities, mentions on social platforms, opinion catalogs, support pages, and, more broadly, the whole living and not always tidy fabric of the internet in which people explain to one another what a product is and how it works. This layer is noisy and unreliable, but it cannot be ignored. It often shapes the language of actual demand. A company may describe itself as a “modular environment for intelligent data management,” while users discuss it as “a convenient service for complex reporting without a heavy implementation.” For AI, that language matters a great deal, because it is the language in which everyday questions are actually phrased.

The fifth layer is structured knowledge. This includes entity databases, open knowledge graphs, catalogs, business directories, organization profiles, standardized descriptions, and, sometimes, schema markup on the site itself. Survey work on integrating external knowledge into language models shows that linking AI to knowledge bases and graphs improves the factual accuracy, traceability, and explainability of the answer [6][8]. For the brand, this means that the role of “boring” and formal-looking sources increases. They rarely create a vivid reputation, but they often provide stable identification of the entity.

Why answer platforms read this environment differently

It is precisely the combination of these layers that explains why the website does not become the main character. It may be the main primary source, but not the main arbiter. Answer systems assess not only what the brand claims about itself, but also how that claim is validated, repeated, challenged, or reformulated by other participants in the network. Put more sharply: the website explains what the brand would like to be seen as; the external environment shows what it is actually seen as; and AI tries to assemble a workable compromise between those versions.

Several practically important consequences follow from this.

First, it is impossible to work seriously on visibility in AI if you limit yourself to the homepage. Even a brilliantly written website does not guarantee that the brand will be named in the answer if external sources either fail to validate its key properties or validate them differently. Second, the official website remains critically important — precisely because it defines the canonical structure of the entity. But its function changes. It must be not only an attractive storefront, but also a reliable point of alignment: a place where AI and humans can see the same name, category, properties, and evidence with equal clarity. Third, the brand has to manage not only its own text, but also the ecosystem of validation: who writes about it and how, which comparisons it appears in, which catalogs and knowledge bases it is present in, where its methodology is represented, and who can independently validate its role in the market.

From editing the website to managing the entire knowledge contour

What matters especially is that different AI platforms read this environment differently. Google relies on its own search infrastructure and AI modes, where indexability and page eligibility for display matter [1][2]. ChatGPT Search brings in web sources either on request or automatically, while taking the dialogue context into account [3][4]. Perplexity emphasizes almost continuous real-time web retrieval and explicit links [5]. Microsoft Copilot likewise describes its answers as grounded in web search and external links [9][10]. For a brand, this means there is no single “source of truth” from which every machine will read the company in the same way. There is a network of sources that each system assembles according to its own logic.

That is why a mature strategy begins with a more mature question. Not “how should we describe ourselves better on the website?” but “what set of sources forms machine opinion about us — and where in that set are we strong, and where are we being undermined by noise, absence, or someone else’s interpretation?” Only after that question does content work stop being cosmetic and become knowledge management.

This is exactly where the brand’s new role on the internet comes into view. It used to be able to think of the website as the main stage, and everything else as noisy background. Now the picture flips. The site remains the stage, but the performance has long since stopped unfolding only there. The whole internet stages it. And the answer system acts not as a spectator, but as an editor, assembling the final version for the user out of a multitude of voices. In that logic, the winner is not the one that speaks most loudly about itself, but the one whose entity is validated most clearly and consistently across the network.

What seems well established

It is well established that answer systems use not one document and not one type of signal. For a stable presence, a brand needs a set of aligned sources, not merely a strong homepage.

What still remains uncertain

The exact relative importance of each layer — the website, external media, reviews, catalogs, knowledge graphs — varies from platform to platform and is rarely disclosed in full.

What this changes in practice

The practical conclusion is straightforward: what must be managed is the entire source contour. An audit of visibility in AI begins with a source map, not with editing a single paragraph on the website.

Sources

[1] Google Search Central. AI Features and Your Website. 2026

[2] Google Search Help. Get AI-Powered Responses with AI Mode in Google Search. 2026

[3] OpenAI. Introducing ChatGPT Search. 2024

[4] OpenAI Help Center. ChatGPT Search. 2026

[5] Perplexity Help Center. How does Perplexity work? 2026

[6] Yu H. et al. Evaluation of Retrieval-Augmented Generation: A Survey. 2024

[7] Zhao P. et al. Retrieval-Augmented Generation for AI-Generated Content: A Survey. Data Science and Engineering, 2026

[8] Ibrahim N. et al. A Survey on Augmenting Knowledge Graphs with Large Language Models. Discover Artificial Intelligence, 2024

[9] Microsoft. Copilot Search in Bing. 2026

[10] Microsoft Support. Understanding Web Search in Microsoft 365 Copilot Chat. 2026

← What AI really “knows” about a company: the brand’s internal representation 7 / 24 Mention, citation, and influence: three levels of brand presence in AI answers →

Related materials

Research article 7 min

External authority versus the brand’s own site: which sources really create the right to be recommended

Which external signals and independent sources help a brand earn the right to be recommended in AI answers — and why the brand's own site without them is not enough.

Open the material →

Research article 8 min

Mention, citation, and influence: three levels of brand presence in AI answers

Three levels of brand presence in AI answers — mention, citation, and influence — and why a single metric is not enough for diagnostics.

Open the material →

Next step

How the report measures web-source strength

In web-augmented mode the model relies on external documents. AI100 separately calculates web boost — how much the answer changes when the system gets access to the live internet, and how often it cites the brand's domain.

See how web boost is calculated →