\n\n\n\n\n\n\n

The Integrity Graph: The Missing Layer In Your AI Visibility Audit via @sejournal, @billhunt

admin

2026年6月11日
12 min read
Audit billhunt Graph Integrity Layer missing sejournal Visibility

The Integrity Graph: The Missing Layer In Your AI Visibility Audit

A recent announcement from Common Crawl introduced an AI Visibility Audit designed to help organizations determine whether AI systems can discover and access their content. The premise is straightforward and difficult to dispute. Before an AI system can retrieve, summarize, cite, recommend, or act upon information, it must first be able to find it.

For years, visibility has been the foundation of search. If Google could not crawl a page, it could not rank it. If an AI system cannot access information, it cannot incorporate that information into responses, recommendations, or decisions.

Yet as I read through the announcement, I found myself thinking about a different problem entirely.

Common Crawl is not a search engine, nor is it an AI platform. It is one of the largest open repositories of web crawl data and has become an important source of training and research data for the broader AI ecosystem. Whether or not a particular AI model uses Common Crawl directly, the project has become a useful proxy for a larger question: Can machines discover and access the information organizations publish online?

That is precisely why the AI Visibility Audit caught my attention.

What happens after the content is discovered?

That question came into focus while reviewing schema implementations across several banking websites. On the surface, most appeared reasonably mature. The sites contained Organization markup, BankOrCreditUnion entities, branch information, product schema, service schema, and many of the components one would expect to see at large financial institutions.

However, when I stopped looking at individual pages and started looking at the relationships between entities, a very different picture emerged. I found most banks had a fundamental schema, but very few had built out a knowledge graph.

The Difference Between Describing A Page And Describing A Business

One recurring theme in the SEO industry is the importance of schema completeness. We audit whether required properties are present. We validate markup against Google’s tools. We look for missing fields and opportunities to expand coverage.

The problem is that most of these exercises evaluate pages in isolation. A branch page is reviewed as such. A product page is reviewed as a product page. A service page is reviewed as such. What often gets overlooked is whether those entities are meaningfully connected.

In the banking examples I reviewed, it was common to find a branch location, a checking account, a mortgage offering, and a corporate organization all marked up separately. What was frequently missing was the connective tissue that explained how those entities related to one another.

Which legal entity owned the consumer-facing brand?
Which products were offered through which services?
Which services were available at which branches?
Which offerings were available only in specific markets or jurisdictions?
Which products belonged to a larger family of financial solutions?

The markup described the individual pieces, but it rarely described the business itself.

That distinction may seem subtle, but it becomes increasingly important as search engines and AI systems move beyond page-level understanding toward entity-level understanding.

The Validator Problem

Part of the issue may stem from how we evaluate structured data. Most validation tools perform a single-page review. They determine whether a page contains the expected properties for a given schema type and whether those properties conform to accepted standards.

This approach works reasonably well when the objective is to generate a rich result or to validate a standalone entity. It becomes less effective when the objective is building a connected knowledge graph.

One of the more frustrating aspects of implementing sophisticated schema architectures is that the very mechanisms designed to create entity relationships often appear incomplete when viewed through page-level validation tools.

The contradiction becomes particularly apparent when organizations attempt to implement graph-based architectures as Google recommends. A branch page may reference its parent organization through an @id relationship that points to the organization’s primary entity definition on the homepage. The organization’s address, legal information, social profiles, and other core attributes are stored in the graph, but not necessarily on the page being tested.

Ironically, some of the same implementations Google recommends for entity alignment can generate warnings in page-level testing tools because the information is intentionally referenced elsewhere rather than duplicated. In effect, organizations are encouraged to build graphs while still being evaluated as though every page were an island.

That distinction may have mattered little during the rich snippet era, when the primary objective was determining whether a single page contained enough information to qualify for a search feature. It becomes increasingly important as search engines, knowledge systems, and AI platforms seek to understand how entities relate to one another across an entire organization.

Google’s Evolution Reveals The Real Direction

Today, many of Google’s most significant investments appear focused on relationships and context. Product Graph, Merchant Center feeds, compatibility data, variant relationships, entity reconciliation, and Conversational Attributes all point in a similar direction. Collectively, these initiatives suggest that understanding relationships between entities has become increasingly important, particularly when those relationships are difficult to infer consistently from content alone.

Google’s actions suggest that relationship inference remains challenging even for one of the world’s most sophisticated information retrieval systems. Otherwise, there would be little reason to continue expanding the mechanisms through which organizations can explicitly provide contextual information about products, services, brands, and audiences.

Common Crawl Measures Visibility. Relationships Determine Understanding

This brings us back to Common Crawl.

The AI Visibility Audit addresses an important challenge. Organizations should absolutely understand whether AI systems can access their content. Content that cannot be discovered cannot influence search results, AI-generated answers, or recommendation systems.

Visibility matters. However, visibility and understanding are not the same thing. In many ways, Common Crawl is asking the same question SEO teams have asked for decades: Can machines reach the content?

The emerging AI challenge is what happens after machines gain access to the content. A crawler can successfully discover every page on a website and still struggle to understand how the underlying entities connect. Historically, search engines attempted to infer those relationships from content, links, user behavior, and countless other signals. In many cases, they became remarkably good at it. Yet Google’s recent investments suggest that inference has limits.

Consider the recent introduction of Conversational Attributes in Merchant Center. Rather than relying solely on AI systems to determine which products solve similar problems, which products are alternatives, or which attributes matter in specific situations, Google is increasingly asking merchants to provide that context directly.

Google clearly possesses the resources, data, and AI capabilities to make educated guesses about product relationships. Nevertheless, it continues to seek information directly from the organizations that manufacture, sell, and support those products.

The reason is simple. Inference can be powerful, but first-party knowledge is often more accurate.

A manufacturer knows which products are compatible. A retailer knows which products are commonly purchased together. A bank knows which services are available at which branches. A global company knows which product variations apply in specific markets.

While AI systems can attempt to reconstruct those relationships from content, organizations already possess the answers. The question, therefore, is not whether AI can infer relationships. The more important question is whether the organizations that own those relationships can and would provide a reliable way for machines to understand them.

That distinction becomes increasingly important as AI systems move beyond retrieving information and begin synthesizing, recommending, and acting upon it. The information may already exist somewhere on the website, but the contextual relationships that give it meaning are often left for machines to discover on their own.

Are We Ready For The Agentic Hype Machine?

Over the past year, the industry has become increasingly focused on concepts such as MCP, WebMCP, agent skills, agent cards, API catalogs, A2A protocols, and llms.txt files. Much of the discussion assumes that the web is rapidly evolving toward an agent-first ecosystem.

Recent Agentic Readiness research by Bastian Grimm offers a useful reality check. After benchmarking highly visible websites across the United States, the United Kingdom, and Germany, he found that adoption of these agent-oriented standards remains remarkably limited. The overwhelming majority of sites exposed none of the agent-discovery mechanisms currently being promoted by the industry.

That finding does not suggest the agent-ready web is unimportant, but suggests we may be getting ahead of ourselves. More importantly, even if every major website deployed llms.txt, WebMCP manifests, and API catalogs tomorrow, the same underlying challenge would remain.

What information are those systems exposing?

A machine-readable doorway is valuable only if it leads to accurate, connected, and contextually complete information. If the underlying relationships between products, brands, locations, services, and markets are poorly modeled, agentic access simply makes incomplete information easier to retrieve.

The access layer is not the hard part. The relationship layer is.

Beyond Entity Graphs: Introducing The Integrity Graph

Most discussions around structured data focus on building an Entity Graph to help machines understand the company, product, location, and how they are connected to each other. Those capabilities are important. However, AI systems face a more difficult challenge. They must determine which facts apply within which contexts. This is where I believe organizations need to begin thinking about what I call an Integrity Graph.

An Integrity Graph extends beyond entity identification to preserve contextual truth.

It helps establish which legal entity owns a brand, which products belong to a product family, which services are available in specific markets, which branches offer particular services, which regulations apply in particular jurisdictions, and which information is globally applicable versus locally relevant.

Simply identifying entities is no longer enough. Organizations must preserve the integrity of their relationships.

What Organizations Should Audit Next

The growing number of AI readiness audits highlights how quickly the conversation is evolving. Common Crawl’s AI Visibility Audit focuses on discoverability and accessibility. Bastian Grimm’s benchmark for agent-ready technologies assesses whether websites provide machine-readable interfaces that agents can discover and interact with. Dixon Jones and the team at Waikay approach the challenge from yet another angle, Brand AI Visibility Audit, evaluating whether AI systems can recognize brands, understand entities, and accurately associate an organization with the topics, products, and concepts it seeks to own.

Viewed collectively, these emerging audit frameworks reveal that the industry is evaluating several distinct layers of machine understanding.

Common Crawl focuses on visibility and accessibility by asking whether machines can discover and access the content.

Agentic readiness frameworks examine whether agents can discover capabilities and interact with systems.

Entity visibility assessments assess whether AI systems can correctly identify brands, organizations, and the concepts associated with them.

Relationship integrity focuses on a different question entirely: whether machines understand how the organization itself operates.

Each layer builds upon the one before it. Content must be discoverable before it can be accessed. It must be accessible before it can be associated with an entity. It must be associated with an entity before machines can accurately understand the relationships that give the information meaning.

Why This Matters For Global Organizations

The importance of relationship integrity becomes even more obvious when viewed through an international lens.

A multinational company may have content available in twenty markets. Common Crawl can successfully discover all of it. AI systems can retrieve it. Search engines can index it. The visibility problem is solved.

For years, international SEO focused on helping search engines show the correct page to the correct user. AI systems introduce a different challenge. Now we must help machines understand the correct facts for the correct audience, market, and context.

We must ensure clarity on which product information applies in Germany, which regulations apply in Japan, and which services are available in Canada. Often, an equally complex challenge is which local brand names map to the same global product, and which facts are globally true and which are market-specific? These are not crawling and retrievability problems but data integrity problems.

In many ways, the next generation of international SEO may resemble hreflang at the knowledge level rather than at the URL level. The challenge is no longer simply routing users to the correct page. The challenge is ensuring machines understand the correct version of the truth.

The Next Competitive Advantage

The banking analysis that inspired this article illustrates the issue well. Most of the institutions had no shortage of schema. Their websites contained thousands of lines of structured data and numerous schema types. What they lacked was a coherent representation of how the business itself operated. That focus makes sense because discoverability remains a prerequisite for participation. However, discoverability alone will not be enough.

The organizations that thrive in the next phase of search may not be those with the most schema markup, the most pages, or the most AI-ready endpoints. They may be the organizations that provide the clearest, most complete, and most trustworthy representation of how their entities, products, services, locations, brands, and markets relate to one another. The next challenge is determining whether machines understand how the business actually works.

That shift may ultimately prove more important than any individual schema property, API endpoint, or AI optimization tactic. As search engines and AI systems become increasingly capable of retrieving information, the competitive advantage will move toward organizations that can provide context, preserve relationships, and maintain the integrity of their knowledge.

Understanding an entity is only the beginning. Understanding how that entity relates to everything around it is where the real value lies.

More Resources:

Featured Image: Roman Samborskyi/Shutterstock

Search Visibility & Value,SEO#Integrity #Graph #Missing #Layer #Visibility #Audit #sejournal #billhunt1781108581