{"id":4162,"date":"2026-03-02T23:12:18","date_gmt":"2026-03-02T15:12:18","guid":{"rendered":"http:\/\/longzhuplatform.com\/?p=4162"},"modified":"2026-03-02T23:12:18","modified_gmt":"2026-03-02T15:12:18","slug":"information-retrieval-part-4-sigh-grounding-rag","status":"publish","type":"post","link":"http:\/\/longzhuplatform.com\/?p=4162","title":{"rendered":"Information Retrieval Part 4 (Sigh): Grounding &amp; RAG"},"content":{"rendered":"<p><\/p> <div id=\"narrow-cont\"> <p>When we\u2019re talking about grounding, we mean fact-checking the hallucinations of planet destroying robots and tech bros.<\/p> <p>If you want a non-stupid opening line, when models accept they don\u2019t know something, they ground results in an attempt to fact check themselves.<\/p> <p>Happy now?<\/p> <h2>TL;DR<\/h2> <ol> <li>LLMs don\u2019t search or store sources or individual URLs; they generate answers from pre-supplied content.<\/li> <li>RAG anchors LLMs in specific knowledge backed by factual, authoritative, and current data. It reduces hallucinations.<\/li> <li>Retraining a foundation model or fine-tuning it is computationally expensive and resource-intensive. Grounding results is far cheaper.<\/li> <li>With RAG, enterprises can use internal, authoritative data sources and gain similar model performance increases without retraining. It solves the lack of up-to-date knowledge LLMs have (or rather don\u2019t).<\/li> <\/ol> <h2>What Is RAG?<\/h2> <p>RAG (Retrieval Augmented Generation) is a form of grounding and a foundational step in answer engine accuracy. LLMs are trained on vast corpuses of data, and every dataset has limitations. Particularly when it comes to things like newsy queries or changing intent.<\/p> <p><iframe class=\"sej-iframe-auto-height\" id=\"in-content-iframe\" scrolling=\"no\" src=\"https:\/\/www.searchenginejournal.com\/wp-json\/sscats\/v2\/tk\/Middle_Post_Text\"><\/iframe><\/p> <p>When a model is asked a question, it doesn\u2019t have the appropriate confidence score to answer accurately; it reaches out\u00a0to specific trusted sources to\u00a0ground\u00a0the response. Rather than relying solely on outputs from its\u00a0training data.<\/p> <p>By bringing in this relevant, external information, the retrieval system identifies relevant, similar pages\/passages and includes the chunks as part of the answer.<\/p> <blockquote> <p>This provides a really valuable look at why being in the training data is so important. You are more likely to be selected as a trusted source for RAG if you appear in the training data for relevant topics.<\/p> <p>It\u2019s one of the reasons why\u00a0disambiguation and accuracy\u00a0are more important than ever in today\u2019s iteration of the internet.<\/p> <\/blockquote> <h3>Why Do We Need It?<\/h3> <p>Because LLMs are notoriously hallucinatory. They have been trained to provide you with an answer. Even if the answer is wrong.<\/p> <p>Grounding results provides some relief from the flow of batshit information.<\/p> <p>All models have a cutoff limit in their training data.\u00a0They can be a year old or more. So anything that has happened in the last year would be unanswerable without the real-time grounding of facts and information.<\/p> <p>Once a model has ingested a sizeable amount of training data, it is\u00a0far cheaper to rely on a RAG pipeline\u00a0to answer new information rather than re-training the model.<\/p> <p>Dawn Anderson has a great presentation called \u201cYou Can\u2019t Generate What You Can\u2019t Retrieve.\u201d Well worth a read, even if you can\u2019t be in the room.<\/p> <h2>Do Grounding And RAG Differ?<\/h2> <p>Yes. RAG is a form of grounding.<\/p> <p>Grounding is a broad brush term applied used to apply to any type of anchoring AI responses in trusted<em>,\u00a0<\/em>factual data. RAG achieves grounding by retrieving relevant documents or passages from external sources.<\/p> <p>In almost every case you or I will work with, that source is a live web search.<\/p> <p>Think of it like this;<\/p> <ul> <li><strong>Grounding\u00a0<\/strong>is the final output \u2013 \u201c<em>P<\/em><em>lease stop making things up.\u201d<\/em><\/li> <li><strong>RAG\u00a0<\/strong>is the mechanism. When it doesn\u2019t have the appropriate confidence to answer a query, ChatGPT\u2019s internal monologue says, \u201c<em>D<\/em><em>on\u2019t just lie about it, verify the information.<\/em>\u201c<\/li> <li>So grounding can be achieved through\u00a0<strong>fine-tuning, prompt engineering, <\/strong>or\u00a0<strong>RAG.<\/strong><\/li> <li>RAG either supports its claims when the threshold isn\u2019t met or finds the source for a story that doesn\u2019t appear in its training data.<\/li> <\/ul> <p>Imagine a fact you hear down the pub. Someone tells you that the scar they have on their chest was from a shark attack. A hell of a story. A quick bit of verifying would tell you that they\u00a0choked on a peanut in said pub and had to have a nine-hour operation to get a part of their lung removed.<\/p> <p>True story \u2013 and one I believed until I was at university. It was my dad.<\/p> <blockquote> <p>There is a lot of conflicting information out there as to what web search these models use. However, we have very solid information that\u00a0ChatGPT is (still) scraping Google\u2019s search results\u00a0to form its responses when using web search.<\/p> <\/blockquote> <h2>Why Can No-One Solve AI\u2019s Hallucinatory Problem?<\/h2> <p>A lot of hallucinations make sense when you frame it as a model filling the gaps. The fails seamlessly.<\/p> <p>It is a plausible falsehood.<\/p> <p>It\u2019s like\u00a0Elizabeth Holmes of Theranos infamy. You know it\u2019s wrong, but you don\u2019t want to believe it. The\u00a0<em>you<\/em>\u00a0here being some immoral old media mogul or some investment firm who cheaped out on the due diligence.<\/p> <div> <blockquote> <p>\u201cEven as language models become more capable, one challenge remains stubbornly hard to fully solve: hallucinations. By this we mean instances where a model confidently generates an answer that isn\u2019t true.\u201d<\/p> <\/blockquote> <\/div> <p>That is a direct quote from\u00a0OpenAI. The hallucinatory horse\u2019s mouth.<\/p> <p>Models hallucinate for a few reasons. As argued in OpenAI\u2019s most recent research paper, they hallucinate because training processes and evaluation reward an answer. Right or not.<\/p> <figure id=\"attachment_568374\" class=\"wp-caption aligncenter\" style=\"width: 808px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-33-213.png\" alt=\"OpenAI model error rates table comparison\" width=\"808\" height=\"504\" class=\"wp-image-568374 size-full\" srcset=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-33-213-384x240.png 384w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-33-213-425x265.png 425w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-33-213-480x299.png 480w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-33-213-680x424.png 680w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-33-213-768x479.png 768w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-33-213.png 808w\" sizes=\"auto, (max-width: 808px) 100vw, 808px\" loading=\"lazy\" title=\"Information Retrieval Part 4 (Sigh): Grounding &amp; RAG\u63d2\u56fe\" \/><figcaption class=\"wp-caption-text\">The error rates are \u201chigh.\u201d Even on the more advanced models. (Image Credit: Harry Clarkson-Bennett)<\/figcaption><\/figure> <p>If you think of it in a\u00a0Pavlovian conditioning\u00a0sense, the model gets a treat\u00a0<em>when it answers<\/em>. But that doesn\u2019t really answer\u00a0<em>why<\/em>\u00a0models get things wrong. Just that the models have been trained to answer your ramblings confidently and without recourse.<\/p> <p>This is largely due to how the model has been trained.<\/p> <p>Ingest enough\u00a0structured or semi-structured data (with no right or wrong labelling), and they become incredibly proficient at predicting the next word. At sounding like a sentient being.<\/p> <p>Not one you\u2019d hang out with at a party. But a sentient sounding one.<\/p> <p>If a fact is mentioned dozens or hundreds of times in the training data, models are far less-likely to get this wrong. Models value repetition. But seldom referenced facts act as a proxy for how many \u201cnovel\u201d outcomes you might encounter in further sampling.<\/p> <p>Facts referenced this infrequently are grouped under the term\u00a0the singleton rate. In a never-before-made comparison, a high singleton rate is a recipe for disaster for LLM training data, but brilliant for Essex hen parties.<\/p> <div> <p>According to this paper on\u00a0why language models hallucinate:<\/p> <blockquote> <p>\u201cEven if the training data were error-free, the objectives optimized during language model training would lead to errors being generated.\u201d<\/p> <\/blockquote> <\/div> <p>Even when the training data is 100% error-free, the model will generate errors. They are built by people. People are flawed, and we love confidence.<\/p> <blockquote> <p>Several post-training techniques \u2013 like reinforcement learning from human feedback or, in this case, forms of grounding \u2013 do reduce hallucinations.<\/p> <\/blockquote> <h2>How Does RAG Work?<\/h2> <p><em>Technically<\/em>, you could say that the RAG process is initiated long before a query is received. But I\u2019m being a bit arsey there. And I\u2019m not an expert.<\/p> <p>Standard LLMs source information from their databases. This data is ingested to train the model in the form of\u00a0parametric memory\u00a0(more on that later). So, whoever is training the model is making explicit decisions about the type of content that will likely require a form of grounding.<\/p> <p>RAG adds an information retrieval component to the AI layer. The system:<\/p> <p>\u27a1\ufe0f\u00a0<strong>Retrieves data<\/strong><\/p> <p>\u27a1\ufe0f\u00a0<strong>Augments the prompt<\/strong><\/p> <p><strong>\u27a1\ufe0f Generates an improved response.<\/strong><\/p> <p>A more detailed explanation (should you want it) would look something like:<\/p> <ol> <li>The user inputs a query, and it\u2019s converted into\u00a0a vector.<\/li> <li>The LLM uses its\u00a0parametric memory to attempt to predict the next likely sequence of tokens.<\/li> <li>The vector distance between the query and a set of documents is calculated using\u00a0Cosine Similarity\u00a0or\u00a0Euclidean Distance.<\/li> <li>This determines whether the model\u2019s stored (or parametric) memory is capable of fulfilling the user\u2019s query without calling an external database.<\/li> <li>If a certain confidence threshold isn\u2019t met,\u00a0RAG\u00a0(or a form of grounding) is called.<\/li> <li>A\u00a0retrieval query is sent to the external database.<\/li> <li>The RAG architecture\u00a0augments the existing answer. It clarifies factual accuracy or adds information to the incumbent response.<\/li> <li>A final, improved output is\u00a0generated.<\/li> <\/ol> <p>If a model is using an external database like Google or Bing (which they all do), it doesn\u2019t need to create one to be used for RAG.<\/p> <p>This makes things a ton cheaper.<\/p> <p>The problem the tech heads have is that they all hate each other. So when\u00a0Google dropped the num=100 parameter in September 2025, ChatGPT citations fell off a cliff. They could no longer use their third-party partners to scrape this information.<\/p> <figure id=\"attachment_568375\" class=\"wp-caption aligncenter\" style=\"width: 1080px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970.png\" alt=\"Lily Ray's note around citations dropping on Reddit and Wikipedia \" width=\"1080\" height=\"750\" class=\"wp-image-568375 size-full\" srcset=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970-384x267.png 384w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970-425x295.png 425w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970-480x333.png 480w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970-680x472.png 680w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970-768x533.png 768w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970-850x590.png 850w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970-1024x711.png 1024w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-34-970.png 1080w\" sizes=\"auto, (max-width: 1080px) 100vw, 1080px\" loading=\"lazy\" title=\"Information Retrieval Part 4 (Sigh): Grounding &amp; RAG\u63d2\u56fe1\" \/><figcaption class=\"wp-caption-text\">Image Credit: Harry Clarkson-Bennett<\/figcaption><\/figure> <p>It\u2019s worth noting that more modern RAG architectures apply a hybrid model of retrieval, where semantic searching is run alongside more basic keyword-type matches. Like updates to BERT\u00a0(DaBERTa) and\u00a0RankBrain, this means the answer takes the entire document and contextual meaning into account when answering.<\/p> <blockquote> <p>Hybridization makes for a far superior model. In this agriculture case study, a base model hit 75% accuracy, fine-tuning bumped it to 81%, and fine-tuning + RAG jumped to 86%.<\/p> <\/blockquote> <h2>Parametric Vs. Non-Parametric Memory<\/h2> <p>A model\u2019s parametric memory\u00a0is essentially the patterns it has learned from the training data it has greedily ingested.<\/p> <p>During the pre-training phase, the models ingest an enormous amount of data \u2013 words, numbers, multi-modal content, etc. Once this data has been turned into a vector space model, the LLM is able to identify patterns in its\u00a0neural network.<\/p> <p>When you ask it a question, it calculates the probability of the next possible token and calculates the possible sequences by order of probability. The temperature setting is what provides a level of\u00a0<em>randomness<\/em>.<\/p> <p>Non-parametric memory\u00a0stores (or accesses) information in an external database. Any search index being an obvious one. Wikipedia, Reddit, etc., too. Any kind of ideally well-structured database. This allows the model to retrieve specific information when required.<\/p> <p>RAG methodologies are able to ride these two competing, highly complementary disciplines.<\/p> <ol> <li>Models gain an \u201cunderstanding\u201d of language and nuance through parametric memory.<\/li> <li>Responses are then enriched and\/or grounded to verify and validate the output via\u00a0non-parametric memory.<\/li> <\/ol> <blockquote> <p>Higher temperatures increase randomness. Or \u201ccreativity.\u201d Lower temperatures the opposite.<\/p> <p>Ironically these models are incredibly uncreative. It\u2019s a bad way of framing it, but mapping words and documents into tokens is about as statistical as you can get.<\/p> <\/blockquote> <h2>Why Does It Matter For SEO?<\/h2> <p>If you care about AI search and it matters for your business, you need to rank well in search engines. You want to force your way into consideration when RAG searches apply.<\/p> <p>You should know how RAG works and how to influence it.<\/p> <p>If your brand features poorly in the training data of the model, you cannot immediately change that. Well, for future iterations, you can. But the model\u2019s knowledge base isn\u2019t updated on the fly.<\/p> <figure id=\"attachment_568376\" class=\"wp-caption aligncenter\" style=\"width: 850px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-35-439.png\"  width=\"850\" height=\"338\" class=\"wp-image-568376 size-full\" srcset=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-35-439-384x153.png 384w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-35-439-425x169.png 425w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-35-439-480x191.png 480w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-35-439-680x270.png 680w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-35-439-768x305.png 768w, https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/02\/image-35-439.png 850w\" sizes=\"auto, (max-width: 850px) 100vw, 850px\" loading=\"lazy\" title=\"Information Retrieval Part 4 (Sigh): Grounding &amp; RAG\u63d2\u56fe2\" alt=\"Information Retrieval Part 4 (Sigh): Grounding &amp; RAG\u63d2\u56fe2\" \/><figcaption class=\"wp-caption-text\">We know how big Google\u2019s grounding chunks are. The better you rank, the better your chance (Image Credit: Harry Clarkson-Bennett)<\/figcaption><\/figure> <p>So, you rely on featuring prominently in these external databases in order to be part of the answer. The better you rank, the more likely you are to feature in RAG-specific searches.<\/p> <p>I highly recommend watching Mark Williams-Cook\u2019s From Rags to Riches presentation. It\u2019s excellent. Very reasonable and gives some clear guidance on how to find queries that require RAG and how you can influence them.<\/p> <p><iframe loading=\"lazy\" title=\"From RAG to Riches - Mark Williams-Cook | SnS Antalya 2025\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/gBcFkf5DWpc?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p> <h2>Basically, Again, You Need To Do Good SEO<\/h2> <ol> <li>Make sure you rank as high as possible for the relevant term in search engines.<\/li> <li>Make sure you understand how to maximize your chance of featuring in an LLM\u2019s grounded response.<\/li> <li>Over time, do some better marketing to get yourself into the training data.<\/li> <\/ol> <p>All things being equal, concisely answered queries that clearly match relevant entities that add something to the corpus will work. If you\u00a0<em>really\u00a0<\/em>want to follow\u00a0chunking best practice for AI retrieval, somewhere around 200-500 characters seems to be the sweet spot.<\/p> <p>Smaller chunks allow for more accurate, concise retrieval. Larger chunks have more context, but can create a more \u201clossy\u201d environment, where the model loses its mind in the middle.<\/p> <h2>Top Tips (Same Old)<\/h2> <p>I find myself repeating these at the end of every training data article, but I do think it all remains broadly the same.<\/p> <ul> <li>Answer the relevant query high up the page (front-loaded information).<\/li> <li>Clearly and concisely match your entities.<\/li> <li>Provide some level of information gain.<\/li> <li>Avoid ambiguity,\u00a0particularly in the middle of the document.<\/li> <li>Have a clearly defined argument and page structure, with well-structured headers.<\/li> <li>Use lists and tables. Not because they\u2019re less resource-intensive token-wise, but because they tend to contain less information.<\/li> <li>My god be interesting. Use unique data, images, video. Anything that will satisfy a user.<\/li> <li>Match their intent.<\/li> <\/ul> <p>As always, very SEO. Much AI.<\/p> <p>This article is part of a short series:<\/p> <p><strong>More Resources:<\/strong><\/p> <hr\/> <p><em>Read Leadership in SEO. Subscribe now.<\/em><\/p> <hr\/> <p><em>Featured Image: Digineer Station\/Shutterstock<\/em><\/p> <\/div> <p>SEO#Information #Retrieval #Part #Sigh #Grounding #amp #RAG1772464338<\/p> ","protected":false},"excerpt":{"rendered":"<p>When we\u2019re talking about grounding, we mean fact-checking the hallucinations of planet destroying robots and tech bros. If you want a non-stupid opening line, when models accept they don\u2019t know something, they ground results in an attempt to fact check themselves. Happy now? TL;DR LLMs don\u2019t search or store sources or individual URLs; they generate [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4163,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[87,14429,6657,3800,14430,88,14428],"class_list":["post-4162","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-accessibility","tag-amp","tag-grounding","tag-information","tag-part","tag-rag","tag-retrieval","tag-sigh"],"acf":[],"_links":{"self":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/4162","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4162"}],"version-history":[{"count":0,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/4162\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/media\/4163"}],"wp:attachment":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4162"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4162"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4162"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}