\n\n\n\n\n\n\n

How AI decides what your content means and why it gets you wrong

admin

2026年4月8日
23 min read
Content decides means Opinion wrong

How AI decides what your content means and why it gets you wrong

Google once attributed two of Barry Schwartz’s Search Engine Land articles to me — a misclassification at the annotation layer that briefly rewrote authorship in Google’s systems.

For a few days, when you searched for certain Search Engine Land articles Schwartz had written, Google listed me as the author. The articles appeared in my entity’s publication list and were connected to my Knowledge Panel.

What happened illustrates something the SEO industry has almost entirely overlooked: that annotation — not the content itself — is the key to what users see and thus your success.

How Google annotated the page and got the author wrong

Googlebot crawled those pages, found my name prominently displayed below the article (my author bio appeared as the first recognized entity name beneath the content), and the algorithm at the annotation gate added the “Post-It” that classified me as the author with high confidence.

This is the most important point to bear in mind: the bot can misclassify and annotate, and that defines everything the algorithms do downstream (in recruitment, grounding, display, and won). In this case, the issue was authorship, which isn’t going to kill my business or Schwartz’s.

But if that were a product, a price, an attribute, or anything else that matters to the intent of a user search query where your brand should be one of the obvious candidates, when any aspect of content is inaccurately annotated, you’ve lost the “ranking game” before you even started competing.

Annotation is the single most important gate in taking your brand from discover to won, whatever query, intent, or engine you’re optimizing for.

Your customers search everywhere. Make sure your brand shows up.

The SEO toolkit you know, plus the AI visibility data you need.

Start Free Trial

Get started with

Semrush One Logo

What annotation is and why it isn’t indexing

Indexing (Gate 4) breaks your content into semantic chunks, converts it, and stores it in a proprietary format. Annotation (Gate 5) then labels those chunks with a confidence-driven “Post-It” classification system.

It’s a pragmatic labeler and attaches classifications to each chunk, describing:

What that chunk contains factually.
In what circumstances it might be useful.
The trustworthiness of the information.

Importantly, it’s mostly unopinionated when labeling facts, context, and trustworthiness. Microsoft’s Fabrice Canel confirmed the principle that the bot tags without judging, and that filtering happens at query time.

What does that mean? The bot annotates neutrally at crawl time, classifying your content without knowing what query will eventually trigger retrieval.

Annotation carries no intent at all. It’s the insight that has completely changed my approach to “crawl and index.”

That clearly shows you that indexing isn’t the ultimate goal. Getting your page indexed is table stakes. Full, correct, and confident annotation is where the action happens: an indexed page that is poorly annotated is invisible to each of the algorithmic trinity.

The annotation system analyzes each chunk using one or more language models, cross-referenced against the web index, the knowledge graph, and the models’ own parametric knowledge. But it analyzes each chunk in the context of the page wrapper.

The page-level topic, entity associations, and intent provide the frame for classifying each chunk. If the page-level understanding is confused (unclear topic, ambiguous entity, mixed intent), every chunk annotation inherits that confusion. Even more importantly, it assigns confidence to every piece of information it adds to the “Post-Its.”

The choices happen downstream: each of the algorithmic trinity (LLMs, search engines, and knowledge graphs) uses the annotation to decide whether to absorb your content at recruitment (Gate 6). Each has different criteria, so you need to assess your own content for its “annotatability” in the context of all three.

And a small but telling detail: Back in 2020, Martin Splitt suggested that Google compares your meta description to its own LLM-generated summary of the page. When they match, the system’s confidence in its page-level understanding increases, and that confidence cascades into better annotation scores for every chunk — one of thousands of tiny signals that accumulate.

Annotation is the key midpoint of the 10-gate pipeline, where the scoreboard turns on. Everything before it is infrastructure: “Can the system access and store your content?” Everything after it is competition:

Annotation is where you simply cannot afford to fail

When you consider what happens at the annotation gate and its depth, links and keywords become the wrong lens entirely. They describe how you tried to influence a ranking system, whereas annotation is the mechanism behind how the algorithmic trinity chooses the content that builds its understanding of what you are.

The frame has to shift. You’re educating algorithms. They behave like children, learning from what you consistently, clearly, and coherently put in front of them. With consistent, corroborated information, they build an accurate understanding.

Given inconsistent or ambiguous signals, they learn incorrectly and then confidently repeat those errors over time. Building confidence in the machine’s understanding of you is the most important variable in this work, whether you call it SEO or AAO.

*“Confiance” (confidence) is the signal that drives how systems understand content. Slide from my SEOCamp Lyon 2017 presentation.*

In 2026, every AI assistive engine and agent is that same child, operating at a greater scale and with higher stakes than Google ever had. Educating the algorithms isn’t a metaphor. It’s the operational model for everything that follows.

For a more academic perspective, see: “Annotation Cascading: Hierarchical Model Routing, Topical Authority, and Inter-Page Context Propagation in Large-Scale Web Content Classification.”

5 levels of annotation: 24+ dimensions classifying your content at Gate 5

When mapping the annotation dimensions, I identified 24, organized across five functional categories. After presenting this to Canel, his response was: “Oh, there is definitely more.”

Of course there are. This taxonomy is built through observation first, then naming what consistently appears. The [know/guess] distinctions follow the same logic: test hypotheses, eliminate what doesn’t hold up, and keep what remains.

The five functional categories form the foundation of the model. They are simple by design — once you understand the categories, the dimensions follow naturally. There are likely additional dimensions beyond those mapped here.

What follows is the taxonomy: the categories are directionally sound (as confirmed by Canel), while the specific dimension assignments reflect observed behavior and remain incomplete.

Level 1: Gatekeepers (eliminate)

Temporal scope, geographic scope, language, and entity resolution. Binary: pass or fail.
If your content fails a gatekeeper (wrong language, wrong geography, or ambiguous entity), it is eliminated from that query’s candidate pool instantly. The other dimensions don’t come into play.

Level 2: Core identity (define)

Entities, attributes, relationships, sentiment.
This is where the system decides what your content means:
- Who is being discussed.
- What facts are stated.
- How entities relate.
- What the tone is.
Without clear core identity annotations, a chunk carries no semantic weight in any downstream gate.

Level 3: Selection filters (route)

Intent category, expertise level, claim structure, and actionability.
These determine which competition pool your content enters.
- Is this informational or transactional?
- Beginner or expert?
Wrong pool placement means competing against content that is a better match for the query, and you’ve lost before recruitment or ranking begins.

Level 4: Confidence multipliers (rank)

Verifiability, provenance, corroboration count, specificity, evidence type, controversy level, and consensus alignment. These scale your ranking within the pool.
This is where validated, corroborated, and specific content outranks accurate but unvalidated content.
The multipliers explain why a well-sourced third-party article about you often outperforms your own claims: provenance and corroboration scores are higher.
Confidence has a multiplier effect on everything else and is the most powerful of all signals. Full stop.

Sufficiency, dependency, standalone score, entity salience, and entity role. These determine how your content appears in the final output.
Is this chunk a complete answer, or does it need context? Is your entity the subject, the authority cited, or a passing mention?
Extraction quality determines whether AI quotes you, summarizes you, or ignores you.

Across all five levels, a confidence score is attached to every individual annotation. Not just what the system thinks your content means, but how certain it is.

Clarity drives confidence. Ambiguity kills it.

Canel also confirmed additional dimensions I had not initially mapped: audience suitability, ingestion fidelity, and freshness delta. These sit across the existing categories rather than forming a sixth level.

In 2022, Splitt named three annotation behaviors in a Duda webinar that map directly onto the five-level model. The centerpiece annotation is Level 2 in direct operation:

“We have a thing called the centerpiece annotation,” Splitt confirmed, a classification that identifies which content on the page is the primary subject and routes everything else — supplementary, peripheral, and boilerplate — relative to it.
“There’s a few other annotations” of this type, he noted.

Annotation runs before recruitment, which means a chunk classified as non-centerpiece carries that verdict into every gate that follows. Boilerplate detection is Level 3: content that appears consistently across pages — headers, footers, navigation, and repeated blocks — enters a different competition pool based on its structural role alone.

“We figure out what looks like boilerplate and then that gets weighted differently,” Splitt said

Off-topic routing closes the picture. A page classified around a primary topic annotates every chunk relative to that centerpiece, and content peripheral to the primary topic starts its own competition pool at a disadvantage before Recruitment begins.

Splitt’s example: a page with 10,000 words on dog food and a thousand on bikes is “probably not good content for bikes.” The system isn’t ignoring the bike content. It’s annotating it as peripheral, and that annotation is the routing decision.

Get the newsletter search marketers rely on.

The multiplicative destruction effect: When one near-zero kills everything

In Sydney in 2019, I was at a conference with Gary Illyes and Brent Payne. Illyes explained that Google’s quality assessment across annotation dimensions was multiplicative, not additive.

Illyes asked us not to film, so I grabbed a beer mat and noted a simple calculation: if you score 0.9 across each of 10 dimensions, 0.9 to the power of 10 is 0.35. You survive at 35% of your original signal. If you score 0.8 across 10 dimensions, you survive at 11%. If one dimension scores close to zero, the multiplication produces a result close to zero, regardless of how well you score on every other dimension.

Payne’s phrasing of the practical implication was better than mine: “Better to be a straight C student than three As and an F.”

The beer mat went into my bag. The principle became central to everything I’ve built since.

The multiplicative destruction effect has a direct consequence for annotation strategy: the C-student principle is your guide.

A brand with consistently adequate signals across all 24+ dimensions outperforms a brand with brilliant signals on most dimensions and a near-zero on one. The near-zero cascades.
A gatekeeper failure (Level 1) eliminates the content entirely.
A core identity failure (Level 2) misclassifies it so badly that high confidence multipliers at Level 4 are applied to the wrong entity.
An extraction quality failure (Level 5) produces a chunk that the system can retrieve but can’t deploy usefully. The failure doesn’t have to be dramatic to be fatal.

At the annotation stage, misclassification, low confidence, or near-zero on one dimension will kill your content and take it out of the race.

Nathan Chalmers, who works at Bing on quality, told me something that puts this in a different light entirely. Bing’s internal quality algorithm, the one making these multiplicative assessments across annotation dimensions, is literally called Darwin.

Natural selection is the explicit model: content with near-zero on any fitness dimension is selected against. The annotations are the fitness test. The multiplicative destruction effect is the selection mechanism.

How annotation routes content to specialist language models

The system doesn’t use one giant language model to classify all content. It routes content to specialized small language models (SLMs): domain-specific models that are cheaper, faster, and paradoxically more accurate than general LLMs for niche content.

A medical SLM classifies medical content better than GPT-4 would, because it has been trained specifically on medical literature and knows the entities, the relationships, the standard claims, and the red flags in that domain.

What follows is my model of how the routing works, reconstructed from observable behavior and confirmed principles. The existence of specialist models is confirmed. The specific cascade mechanism is my reconstruction.

The routing follows what I call the annotation cascade. The choice of SLM cascades like this:

Site level (What kind of site is this?)
Refined by category level (What section?)
Refined by page level (what specific topic?)
Applied at chunk level (What does this paragraph claim?)

Each level narrows the SLM selection, and each level either confirms or overrides the routing from above. This maps directly to the wrapper hierarchy from the fourth piece: the site wrapper, category wrapper, and page wrapper each provide context that influences which specialist model the system selects.

The system deploys three types of SLM simultaneously for each topic. This is my model, derived from the behavior I have observed: annotation errors cluster into patterns that suggest three distinct classification axes.

The subject SLM classifies by subject matter — what is this about? — routing content into the right topical domain.
The entity SLM resolves entities and assesses centrality and authority: who are the key players, is this entity the subject, an authority cited, or a passing mention?
The concept SLM maps claims to established concepts and evaluates novelty, checking whether what the content asserts aligns with consensus or contradicts it.

When all three return high confidence on the same entity for the same content, annotation cost is minimal, and the confidence score is very high. When they disagree (i.e., the subject SLM says “marketing,” but the entity SLM can’t resolve the entity, and the concept SLM flags the claims as novel), confidence drops, and the system falls back to a more general, less accurate model.

The key insight? LLM annotation is the failure mode. The system wants to use a specialist. It defaults to a generalist only when it can’t route to a specialist. Generalist annotation produces lower confidence across all dimensions.

The practical implication

Content that’s category-clear within its first 100 words, uses standard industry terminology, follows structural conventions for its content type, and references well-known entities in its domain triggers SLM routing.

Content that’s topically ambiguous or terminologically creative gets the generalist. Lower confidence propagates through every downstream gate.

Now, this may not be the exact way the SLMs are applied as a triad (and it might not even be a trio). However, two things strike me:

Observed outputs act that way.
If it doesn’t function this way, it would be.

First-impression persistence: Why the initial annotation is the hardest to correct

Here is something I’ve observed over years of tracking annotation behavior. It aligns with a principle Canel confirmed explicitly for URL status changes (404s and 301 redirects): the system’s initial classification tends to stick.

When the bot first crawls a page, it selects an SLM, runs the annotation, assigns confidence scores, and saves the classification. The next time it crawls the same page, it logically starts with the previously assigned model and annotations. I call this first-impression persistence.

The initial annotation is the baseline against which all subsequent signals are measured. The system doesn’t re-evaluate from scratch. It checks whether the new crawl is consistent with the existing classification, and if it is, the classification is reinforced.

Canel confirmed a related mechanism: when a URL returns a 404 or is redirected with a 301, the system allows a grace period (very roughly a week for a page, and between one and three months for content, in my observation) during which it assumes the change might revert. After the grace period, the new state becomes persistent. I believe the same principle applies to content classification: a window of fluidity after first publication, then crystallization.

I have direct evidence for the correction side from the evolution of my own terminologies. When I first described the algorithmic trinity, I used the phrase “knowledge graphs, large language models, and web index.” Google, ChatGPT, and Perplexity all picked up on the new term and defined it correctly.

A month later, I changed the last one to “search engine” because it occurred to me that the web index is what all three systems feed off, not just the search system itself. At the point of correction, I had published roughly 10 articles using the original terminology.

I went back and invested the time to change every single one, updating every reference, leaving zero traces. A month later, AI assistive engines were consistently using “search engine” in place of “web index.”

The lesson is that change is possible, but you need to be thorough: any residual contradictory signal (one old article, one unchanged social post, and one cached version) maintains inertia proportionally. Thoroughness is the unlock, rather than time.

A rebrand, career pivot, or repositioning is the practical example. You can change the AI model’s understanding and representation of your corporate or personal brand, but it requires thoroughly and consistently pivoting your digital footprint to the new reality.

In my experience, “on a sixpence” within a week. I’ve done this with my podcast several times. Facebook achieved the ultimate rebrand from an algorithmic perspective when it changed its name to Meta.

The practical implication

Get your annotation right before you publish. The first crawl sets the baseline. A page published prematurely (with an unclear topic or ambiguous entity signals) crystallizes into a low-confidence annotation, and changing it later requires significantly more effort than getting it right the first time.

Annotation-time grounding: The bot cross-references three sources while classifying your content

The system doesn’t annotate in a vacuum. When the bot classifies your content at Gate 5, it cross-references against at least three sources simultaneously. This is my model of the mechanism. The observable effect — that annotation confidence correlates with entity presence across multiple systems — is confirmed from our tracking data.

The bot carries prioritized access to the web index during crawling, checking your content against what it already knows:

Who links to you.
What context those links provide.
How your claims relate to claims on other pages.

Against the knowledge graph, it checks annotated entities during classification — an entity already in the graph with high confidence means annotation inherits that confidence, while absence starts from a much lower baseline.

The SLM’s own parametric knowledge provides the third cross-reference: each SLM compares encountered claims against its training data, granting higher confidence to claims that align, flagging contradictions, and giving lower confidence to novel claims until corroboration accumulates.

This means annotation quality isn’t just about how well your content is written. It’s about how well your entity is already represented across all three of the algorithmic trinity. An entity with strong knowledge graph presence, authoritative web index links, and consistent SLM-domain representation gets higher annotation confidence on new content automatically.

The flywheel: better presence leads to better annotation, which leads to better recruitment, which strengthens presence, and which improves future annotation.

Once again, better to have an average presence in all three than to have a dominant presence in two and no presence in one.

And this is why knowledge graph optimization (what I’ve been advocating for over a decade) isn’t separate from content optimization. They are the same pipeline. Your knowledge graph presence directly improves how accurately, verbosely, and confidently the system annotates every new piece of content you publish.

If you’re thinking “Knowledge graph? That’s just Google,” think again.

In November 2025, Andrea Volpini intercepted ChatGPT’s internal data streams and found an operational entity layer running beneath every conversation: structured entity resolution connected to what amounts to a product graph mirroring Google Shopping feeds.

OpenAI is building its own knowledge graph inside the LLM. My bet is that they will externalize it for several reasons: a knowledge graph in an LLM doesn’t scale, an LLM will self-confirm, so the value is limited, a standalone knowledge graph can be easily updated in real time without retraining the model, and it’s only useful at scale when it stays current.

The algorithmic trinity isn’t a Google phenomenon. It’s the architectural pattern every AI assistive engine and agent converges on, because you can’t generate reliable recommendations without a concept graph, structured entity data, and up-to-date search results to ground them.

Why Google and Bing annotate differently from engines that rent their index

Google and Bing own their crawling infrastructure, indexes, and knowledge graphs. They can afford grace periods, schedule rechecks, and maintain temporal state for URLs and entities over months.

OpenAI, Perplexity, and every engine that rents index access from Google or Bing operate on a fundamentally different model. They have two speeds:

A slow Boolean gate (Does this content exist in the index I have access to?)
A fast display layer (What does the content say right now when I fetch it for grounding?)

The Boolean gate inherits Google’s and Bing’s annotations. Whether your content appears at all depends on whether it was recruited from the index those engines draw from, and that recruitment depends on annotation and selection decisions made by the algorithmic trinity. But what these engines show when they cite you is fetched in real time.

The practical implication

For Google and Bing, you’re optimizing for annotation quality with the benefit of grace periods and gradual reclassification. For engines that don’t own their index, the Boolean presence is inherited from the rented index and is slow to change, but the surface-level display changes every time they re-fetch.

That means what you are seeing in the results is not a direct measure of your annotation quality. It’s a snapshot of your page at the moment of fetch, and those two things may have nothing to do with each other.

How to optimize for annotation quality: The six practical principles

The SEO industry has spent two decades optimizing for search and assistive results — what happens after the system has already decided what your content means. We should be optimizing for annotation.

If the annotation is wrong, everything downstream suffers. When the annotation is accurate, verbose, and confident, your content has a significant advantage in recruitment, grounding, display, and, ultimately, won.

1. Trigger SLM routing

Make your topic category obvious within the first 100 words. Use standard industry terminology. Follow structural conventions. Reference well-known entities. The goal: specialist model, not generalist.

2. Write for all three SLMs

Clear signals for subject (what is this about?), entity (who is the authority?), and concept (what established ideas does this connect to?). Ambiguity on any axis reduces confidence.

3. Get it right before publishing

First-impression persistence means the initial annotation is the hardest to change. Publish only when topic, entity signals, and claims are unambiguous.

4. Build the flywheel

Knowledge graph presence, web index centrality, LLM parameter strengthening, and correct SLM-domain representation all feed annotation confidence for new content. Invest in entity foundation, and every future piece benefits from inherited credibility.

5. Eliminate noise when correcting

Change every reference. Leave zero contradictory signals. Noise maintains inertia proportionally.

6. Audit for annotation, not just indexing

A page can be indexed and still misannotated. If the AI response is wrong about you, the problem is almost certainly at Gate 5, not Gate 8.

Annotation is the gate where most brands silently lose. The SEO industry doesn’t yet have a vocabulary for it. That needs to change, because the gap between brands that get annotation right and brands that don’t is the gap between consistent AI visibility and permanent algorithmic obscurity.

See the complete picture of your search visibility.

Track, optimize, and win in Google and AI search from one platform.

Start Free Trial

Get started with

Semrush One Logo

Why annotation matters so much and why it should be your main focus

You’ve done everything within your power to create the best possible content that maps to intent of your ideal customer profile, you have methodically optimized your digital footprint, your data feeds every entry mode simultaneously: pull, push discovery, push data, MCP, and ambient, so they are all drawing from the same clean, consistent source

So, content about your brand has passed through the DSCRI infrastructure phase, survived the rendering and conversion fidelity boundaries, and arrived in the index (Gate 4) intact. Phew!

Now it gets classified. Annotation is the last moment in the pipeline where you have the field to yourself. Every decision in DSCRI was absolute: you vs. the machine, with no competitor in the frame.

Annotation is still absolute. The system classifies your content based on your signals alone, independently of what any competitor has done. Nobody else’s data changes how your entity is annotated.

But this is the last time you aren’t competing. From recruitment onward, everything is relative. The field opens, every brand that passed annotation enters the same competitive pool, and the advantage you carried through the absolute phase becomes your starting position in the competitive race you have to win.

That means:

Get annotation right, and you start ahead, with confidence that compounds through every downstream gate in RGDW.
Get it wrong, and the multiplicative destruction effect does its work — a near-zero on one annotation dimension cascades through recruitment, grounding, display, and won. No amount of excellent content, structural signals, or entry-mode advantage recovers it.

Warning: First-impression persistence (remember, the first time you are annotated is the baseline) means you don’t get a clean retry. Changing the baseline requires thoroughness, time, and more effort than getting it right on the first crawl.

Annotation isn’t the gate that most brands focus on. It’s the gate where most brands silently lose.

This is the eighth piece in my AI authority series.

The first, “Rand Fishkin proved AI recommendations are inconsistent – here’s why and how to fix it,” introduced cascading confidence.
The second, “AAO: Why assistive agent optimization is the next evolution of SEO,” named the discipline.
The third, “The AI engine pipeline: 10 gates that decide whether you win the recommendation,” mapped the full pipeline.
The fourth, “The five infrastructure gates behind crawl, render, and index,” walked through the infrastructure phase.
The fifth, “5 competitive gates hidden inside ‘rank and display’,” covered the competitive phase.
The sixth, “The entity home: The page that shapes how search, AI, and users see your brand,” mapped the raw material.
The seventh, “The push layer returns: Why ‘publish and wait’ is half a strategy,” extended the entry model.
Up next: “The engine’s recruitment decision: What topical ownership actually means.”

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.

Opinion#decides #content #means #wrong1775598376