Why the card metaphor is misleading

Try asking ChatGPT: "Which analytics platform should I choose for a mid-size e-commerce business?" — and pay attention to how the answer is constructed. The model will not open Gartner, navigate to each vendor's website, or compare pricing plans. It will assemble the answer from what it already has: fragments of documentation, traces of review articles, remnants of someone's Reddit comparisons, pieces of press releases from two years ago. One brand will be described accurately — right category, current features, correct price range. Another will be named but confused with its parent company. A third will be omitted entirely, even though by revenue it is larger than the first two. Where does this unevenness come from? The answer is that there is no brand card inside the model with tidy fields for "name — category — price — advantages." There is something quite different: a distributed network of probabilistic connections — traces in parameters, activatable patterns, hidden computational states, and, in search modes, fresh documents blended in at the moment of response.

As long as a company imagines a neat slot in the machine's memory, it looks for simple fixes: "add more mentions," "update the headlines," "publish another page about us." But once you understand that the brand inside the model is more like a landscape of probabilities than an entry in a database, the task looks different. The question is not how many signals you produce, but how well they cohere: whether the name is stably linked to the category, whether the products are distinguishable from one another, whether key properties are validated from multiple sources, and whether the model can easily separate your entity from its neighbors. Why a strong brand becomes machine-invisible in the first place — four structural causes — is covered in the preceding article; here we go one level deeper, into how knowledge about a company is organized inside the model.

What interpretability research shows

Interpretability research over the past several years has gradually made this internal picture less mysterious. Work by Mor Geva and coauthors showed that the feed-forward blocks of transformer architectures often behave like a kind of “key-value” memory: some input text patterns activate others and push the model toward specific lexical continuations [1]. Work by Kevin Meng and colleagues on locating and editing factual associations showed that some facts in autoregressive models can indeed be linked to relatively localizable computational nodes, especially in the middle layers [2]. A later paper by Masaki Sakata and coauthors found that mentions of the same entity tend to form distinguishable clusters in the internal representation space, while information associated with that entity is often concentrated in a compact linear subspace in the model’s early layers [3]. Finally, survey work on knowledge mechanisms in large language models underscores a general conclusion: knowledge in such systems is real, but distributed, fragile, and dependent on the mode of retrieval [4][5].

The simplest way to picture it is this. Inside the model, the brand exists as a probabilistic landscape. On that landscape there are regions where the company name lies close to words such as “analytics,” “security,” “platform,” “forecasting,” “enterprise market,” or, say, “customer experience management.” There are links to known products. There are traces of old press releases. There is proximity to competitors. There are traces of user questions that, in the training data, were often followed by particular kinds of answers. When the model receives a new prompt, it does not “pull out a card.” It traverses that landscape and assembles the most probable interpretation.

That is why the question “what does AI know about the company?” is better replaced with another one: “what configuration of connections can AI reconstruct about the company, stably, across different contexts?” That formulation is both more precise and more useful. What matters for business is not the model’s abstract awareness, but its stability. If you ask the system the same thing in ten closely related ways, will it keep assigning the brand to the same category? Will it keep linking it to the same core properties? Will it correctly distinguish the product from the company, the company from the parent structure, and the legal name from the consumer-facing one? Or will each new prompt summon a slightly different entity?

Probabilistic landscape, vectors, and stable links

That stability is clearly visible in vector representations (embeddings), the numerical forms into which words, phrases, and fragments of context are translated. The proximity of two such representations is often measured using cosine similarity:

cos(theta) = (x · y) / (||x|| ||y||)

Here x and y are two vectors. One may correspond to a set of brand mentions, the other to a feature such as “enterprise analytics” or “low-cost consumer service.” If the cosine is close to one, the vector directions are similar, and the system tends to treat those objects as tightly connected. If the value is low, or if it changes from one context to another, the connection is weak or unstable. A company does not have direct access to such vectors inside closed commercial models. But the logic is still useful: a brand benefits when the important links in its machine representation are not accidental, but repeatable.

This also clarifies the nature of typical distortions. If the brand name is ambiguous, the model may pull it too tightly toward a general category and erase its distinctiveness. If the company has several product lines described in different languages, they may fail to cohere into a single family inside the model. If the external environment knows the old version of the brand better than the new one, the model will “remember the past” more stubbornly than marketing would like. If competitors have a sharper and better-validated semantic contour, a prompt about a class of solutions will lead not to your company, but to them. And the reverse is also true: if the brand is systematically present in the language of the market, in independent sources, and in its own clear descriptions, the model is more likely to assemble your company specifically, even if it is not the largest player.

Three layers of internal representation and a new diagnostic lens

The brand’s internal representation can usefully be divided into three layers. The first layer is parametric memory. This is what the model absorbed during training and subsequent tuning: general facts, typical associations, and habitual links between the name and its properties. The second layer is contextual assembly. This is how the brand is reconstructed at answer time from the hidden states of the current dialogue: which words in the user’s prompt activated which parts of the system’s knowledge. The third layer is external reinforcement. In answer and search modes, fresh web pages, documents, and knowledge bases are added here, and they influence the final output [4][6][7]. In practice, it is the interaction among these three layers that determines what the brand will look like in the answer.

This architecture explains why many companies misdiagnose the problem. When a brand is not named in the answer, the usual assumption is that “the model does not know us.” Sometimes that is true, but not always. The model may know the company by name and still fail to consider it the best answer to the question. It may remember the product, but fail to connect it to the right use case. It may cite the site correctly, yet rank the importance of properties incorrectly. It may rely on current web sources and, in doing so, override older internal knowledge. In other words, the problem may lie not in the presence of knowledge, but in its configuration.

This is especially important for brands that are used to relying on the force of their own communication. Inside answer systems, the winner is not only the one that speaks loudly about itself, but the one about whom a coherent representation can be built. And a coherent representation requires discipline. The name has to be stable. The category has to be clear. The product structure has to be distinguishable. The properties have to be stated directly, not merely implied. External validation has to be diverse and reliable. Only then does the model have a chance not merely to recognize the brand, but to retain it as a stable entity.

This leads to another important conclusion. Working on the brand’s internal representation does not reduce to “text optimization.” At bottom, it is work on the company’s epistemic form — that is, the form in which the company exists as knowledge. When the brand is poorly assembled as knowledge, the answer system is forced to fill in the gaps probabilistically. When the brand is well assembled, the probability of distortion falls. In that sense, the modern struggle for visibility is not only a struggle for traffic, but also a struggle for the quality of machine understanding.

This perspective is useful for another reason as well: it moves the conversation onto more mature ground. The question is not whether “AI remembers us.” The question is which properties of our brand are extracted stably, which links are lost, which attributes are overweighted, and which do not make it into the answer at all. Those are the questions from which strategy, diagnostics, and substantive work begin. They are what distinguish serious management of machine visibility from a superficial race for random mentions.

What seems well established

We can say with confidence that knowledge in modern language models is distributed and retrieved contextually. It follows that a brand’s stability in answers cannot be reduced to the mere presence of its name in the training material.

What still remains uncertain

What is less firmly established is the exact geometry of that knowledge in closed commercial systems. Academic work reveals the general mechanisms, but we do not have direct access to the internal vectors and assembly rules of each platform.

What this changes in practice

For a company, this means shifting from the language of “text optimization” to the language of epistemic form: it needs to monitor which brand properties are extracted stably and which ones fragment or become distorted.

Sources

[1] Geva M., Schuster R., Berant J., Levy O. Transformer Feed-Forward Layers Are Key-Value Memories. EMNLP, 2021
[2] Meng K., Bau D., Andonian A., Belinkov Y. Locating and Editing Factual Associations in GPT. NeurIPS, 2022
[3] Sakata M., Yokoi S., Heinzerling B., Ito T., Inui K. On Entity Identification in Language Models. Findings of ACL, 2025
[4] Wang M. et al. Knowledge Mechanisms in Large Language Models: A Survey and Perspective. EMNLP Findings, 2024
[5] Wang Y. et al. Factuality of Large Language Models: A Survey. EMNLP, 2024
[6] Yadav I., et al. External Knowledge Integration in Large Language Models: Survey, Methods, Challenges, and Future Directions. Semantic Web Journal, 2025
[7] Google Search Central. AI Features and Your Website. 2026

Related materials

Foundational text 7 min

Why a strong brand can still be invisible to AI systems

Explains the central paradox: a brand can be well known to people and yet poorly distinguishable for AI at the moment of real choice.

Open the material →
Foundational text 7 min

Which sources AI uses to form an opinion about a brand — and why the site is not the only hero

The layers from which AI assembles its opinion of a brand: the brand's own site, search context, independent reviews, user platforms — and why the site is no longer the sole arbiter.

Open the material →
Next step

What the diagnostic layer of the report will show

If a brand is structured inside the model as a probabilistic network, diagnostics must also be layered. In the AI100 report a separate diagnostic corpus of questions shows how the model describes the brand when its name has already been given.

See the diagnostic scenarios →