The same brand, two runs, different neighbors in the table

In April 2026, we ran an audit for a moderately well-known SaaS brand twice, half an hour apart. The same models, the same prompt corpus, the same query language. No new players entered the category in those thirty minutes, the brand did not change its site or positioning, and even the AI providers’ caches remained almost fully warm between runs. The brand received almost the same overall score — a gap of about two points, well within the ordinary noise of repeated measurement.

Only when we opened the second table — mention share within answers — did the picture change. In one model, that figure nearly tripled in half an hour. In another, it roughly doubled. In a third, the shift was more moderate, but still beyond the bounds of reasonable noise.

The explanation turned out to be disappointingly simple: between runs, automatic competitor discovery slightly rewrote the list of neighbors. One player dropped out of the set, another took its place. The brand did not become more visible in answers. Its share was simply being calculated against a different group of neighbors — and arithmetic did the rest.

This is not an anecdote about a broken system. It is a standard effect that appears whenever measurement pretends to measure one object while in fact measuring it within an environment of others. And almost any AI visibility tool measures in exactly that way.

Where the term comes from

“Competitive set” is a much older term than AI visibility. In Aaker’s and Keller’s work on brand equity, it is already a working concept by the late 1990s: a fixed circle of brands against which the focal brand is measured. In Kapferer, the closely related notion is the frame of reference: the circle of comparison within which a brand acquires meaning.

The idea is simple: no brand exists by itself. It always has neighbors — the brands against which the buyer compares it at the moment of choice. And any researcher measuring brand strength has to name those neighbors explicitly. Otherwise, the numbers they obtain no longer have a clear referent.

In a classic brand tracker, the competitive set is selected manually — usually two to four direct competitors plus one or two “indicator” players from adjacent categories. The list is fixed in the research design, and the same names are used again six months or a year later. If the researcher decides to update the set, that becomes a separate methodological decision and is noted in the report.

In AI visibility, things usually work differently. Most often, the competitors are identified by the model itself: we give it a dozen or two prompts of the form “who else is worth considering in this category,” collect the answers, aggregate them, and obtain a set. This is convenient: the researcher does not need deep knowledge of another market and does not need to guess whom to include. But convenience has a cost. Automatic discovery produces a slightly different result from one run to the next. And that turns the set from a stable part of the design into a floating variable.

Where the set hides inside the metrics

AI visibility metrics fall into two classes, depending on how sensitive they are to the composition of the neighboring brands. It is useful to distinguish them, because they are built differently in principle, even though they sit side by side in a report and look similar.

The first class consists of metrics that describe what happens to the brand itself. Did it appear in the model’s answer? In what position did it appear? How often did it make the top three? Did it receive an explicit recommendation? These indicators are calculated from the model’s behavior toward one brand. If the model answers in roughly the same way, the number stays roughly the same regardless of who else is in the set. The appearance of a new company in neighboring rows of the table barely changes them.

The second class consists of share metrics. The brand’s mention share out of all mentions of all brands. The share of scenarios in which the brand surfaced, out of the total number of scenarios in which the model named anyone at all. The citation share of the brand’s domain out of all citations of competitors’ domains. These metrics are relative by nature. They have a numerator — what belongs to the focal brand. And they have a denominator — what belongs to the full set. If the set changes and the newcomer is mentioned less often than the player it displaces, the total denominator shrinks. The numerator does not move. The share rises.

It is the same arithmetic by which you instantly look richer if the banker’s son leaves your class at school. Your own wealth has not changed, but your position in the distribution has — and any statistic that asks “what is your rank by income?” will now produce a different number. Nothing unfair has happened. The measurement simply depended on the composition of the group, and the group changed.

That is why, when we say “the brand’s mention share increased,” we need to keep two different propositions in mind. The first is that the model really started mentioning the brand more often — behavior changed. The second is that the neighbors changed, the denominator was recalculated, and the share drifted — behavior may have stayed the same. Without an explicitly fixed set, the two cases are easy to confuse. And if a decision after the audit — whether to launch a campaign, shift positioning, or spend budget on content strategy — is made under the first interpretation when the second is the one that actually applies, the cost of that mistake can be high.

Why automatic competitor discovery is always slightly different

An AI model does not answer the same way to the same prompt every time. Even when the researcher drives randomness down as far as possible through generation settings, the model still has to choose among several plausible continuations, and that choice can differ slightly from run to run. That is not a defect; it is a property of how modern generative models work.

When we ask, “who else is worth considering in category X?”, the answer almost always begins the same way — the same two or three undisputed leaders the model would return to anyone asking. The differences begin at the edge of the list. When it comes to the eighth, ninth, or tenth name, the model has several roughly equiprobable candidates in mind, and the ranking among them shifts slightly each time.

If you aggregate a dozen or two such answers from one run and another dozen or two from a second, the aggregated lists will match at the top and diverge at the bottom. A brand that, the first time, received exactly enough votes to make the final eight will be ninth the second time and drop out. Another brand — one that was tenth last time — will take its place.

From the standpoint of research design, this is bad news: the periphery of the set is mobile by construction, and no amount of system-side effort can fully stabilize it. You can increase the number of prompts through which competitors are identified — that helps, but not radically. You can warm provider caches — that reduces cost, but barely changes content. The jitter at the edge of the list remains.

From the standpoint of practice, this means something simple: as long as the competitive set is redefined on every run, comparing runs to one another with any metric that includes a denominator is technically incorrect. The brand’s overall score will hold up, because it is built mainly from first-class metrics. Shares will not.

What we do in AI100

AI100 solves this problem as follows. At the first audit of a brand, the system builds the competitive set in the usual way: automatic discovery plus the client’s option to add names they consider important or remove those they consider irrelevant. The final list — what the client approves before launch — is saved as the first version of that brand’s competitive set.

At a repeat audit of the same brand, the system uses the same set by default. In the launch form, the client sees a prompt: “this brand already has a competitive set from [date], use it?” If the answer is yes, the repeat measurement is conducted against the same circle of neighbors as the previous one, and the share metrics become honestly comparable.

If the client wants to revise the set — add a new player, remove one that has dropped out of the market, or refresh the list entirely — that becomes an explicit action that creates a new version of the competitive set. In the report, the line at the bottom of the methodology section shows which version was used, when it was created, and how many brands it contains. This is needed so that, when comparing reports, the client can see whether the comparison is between two runs with the same neighbor set or two runs with different ones.

A fresh audit for a brand that has never been studied before discovers competitors from scratch — but the client can still intervene and adjust the set before launch. Here the system “remembers” nothing, because there is nothing yet to remember; but the formation of the first set remains under the client’s control rather than staying a hidden step.

When it makes sense to update the competitive set

In reality, there are not many situations in which an update makes sense.

Six to twelve months have passed, and the category has changed visibly — someone has exited, someone has grown significantly, the market itself has shifted. In that case, the old set begins to misrepresent present reality, and it is worth refreshing it even if that breaks comparability with last year’s figures. Here, the value of an honest picture is higher than the value of continuous trend lines.

A significant player has appeared who simply did not exist in the first set. If this means one or two names, it is easier to add them manually without launching a full revision — the overall structure of the set will remain, and most share metrics will stay comparable. But if many new names appear at once, that is probably a signal that the set should be refreshed in full.

The brand has changed its positioning or shifted into an adjacent category. The competitive set should follow the brand — otherwise the measurement starts showing not real visibility, but visibility inside a category to which the brand no longer belongs.

In all other cases, it is better to keep the old set. The natural temptation to “refresh it so it stays current” works against the usefulness of the research: every update resets the ability to compare with previous runs. Choosing to keep the set by default is methodological discipline in the purest sense, not conservatism.

What seems well established

AI visibility metrics whose formula contains a denominator over the entire brand set (mention share, scenario share, citation share) are recalculated when the composition of the set changes — even if the model’s behavior toward the focal brand does not change. The effect is built into the arithmetic of the metric, not into the system’s code; it will appear in any AI visibility tool that does not fix the set explicitly.

What still remains uncertain

The exact boundary between “a shift in shares caused by movement in the set” and “a shift in shares caused by a real visibility change” cannot be disentangled in a single repeat run. To separate them, you need either a fixed set, or two runs with the same shift in the set, or a dedicated stability analysis — a separate methodological task that applied AI visibility researchers still solve in different ways.

What this changes in practice

A repeat audit without a fixed competitive set shows not the brand’s dynamics, but the brand’s dynamics overlaid with shifts in the composition of its neighbors. For decisions based on comparing runs to one another — fix the set. For a fresh assessment of the market — update it. Do not confuse those two modes, and state explicitly in the report which mode the work is using.

Sources

[1] Aaker, D. *Building Strong Brands.* The Free Press, 1996.
[2] Keller, K. L. *Strategic Brand Management.* 4th edition. Pearson, 2012.
[3] Kapferer, J.-N. *The New Strategic Brand Management.* 5th edition. Kogan Page, 2012.
[4] AI100. Comparison of two consecutive runs of one brand (internal observation, April 2026).

Related materials

Field note 7 min

Visibility Language Field: why the same brand lives in different competitive worlds

When we ran the same brand across five languages, we expected noise — small score fluctuations. Instead, we found that when the language changes, what changes is not the brand's score but the entire market around it.

Open the material →
Research article 7 min

The “answer bubble”: why the same brand looks different in ChatGPT, Google, Copilot, and other systems

Why there is no single AI visibility: the same brand can look noticeably different across ChatGPT, Google AI Overviews, Copilot, and Perplexity.

Open the material →
Next step

How this connects to AI100 in practice

If you need something more specific than a general overview, AI100 can test how a model sees your company in neutral decision scenarios, which competitors outrank you, and which interventions are most likely to improve visibility.

See the sample report
AI100 Research · Methodology v2026.04 · Published: 2026-04-15