{"id":10664,"date":"2026-06-30T22:04:29","date_gmt":"2026-06-30T14:04:29","guid":{"rendered":"http:\/\/longzhuplatform.com\/?p=10664"},"modified":"2026-06-30T22:04:29","modified_gmt":"2026-06-30T14:04:29","slug":"how-chatgpt-actually-picks-sources-i-read-the-network-traffic-not-the-outputs-via-sejournal-suganthan","status":"publish","type":"post","link":"http:\/\/longzhuplatform.com\/?p=10664","title":{"rendered":"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan"},"content":{"rendered":"<p><\/p> <div id=\"narrow-cont\"> <p>I keep getting the same question from clients and SEOs (GEOs?).<\/p> <p><em>\u201cHow do we show up in ChatGPT?\u201d<\/em><\/p> <p>The answer is always the same. Write good content, do listicles, comment on Reddit.<\/p> <p>The usual.<\/p> <p>But, how do we actually know any of that works? Most of it gets repeated on faith, one expert quoting the last.<\/p> <p>So, instead of taking it on trust, I spent a few days reading what ChatGPT sends my browser underneath the reply. The raw network traffic, in readable JSON.<\/p> <p>This is a walk-through of what I found, roughly in the order I found it.<\/p> <blockquote> <p><em>Before you quote a number from this, read this. It\u2019s one person, one logged-in Pro account, a few days of traffic, not a population study. I logged about 1,240 source records across a few dozen searches. The structural findings, the fields ChatGPT uses and how they behave, are firm, because you only need to see a field once to know it\u2019s real, and I saw them again and again. The numbers and percentages are a different matter. They come from a small batch of mostly SaaS and tech queries, so treat them as direction, not measurement. I flag which is which throughout.<\/em><\/p> <\/blockquote> <p><iframe class=\"sej-iframe-auto-height\" id=\"in-content-iframe\" scrolling=\"no\" src=\"https:\/\/www.searchenginejournal.com\/wp-json\/sscats\/v2\/tk\/Middle_Post_Text\"><\/iframe><\/p> <h2>How This Differs From The Big Visibility Studies, And What You Can Take To The Bank<\/h2> <p>There are two ways to do such a study, and they point in opposite directions.<\/p> <p>The big studies, the ones the platforms and the well-funded tools run, fire thousands of prompts, record which brands appear in the answers, and roll that up into share-of-voice reports. Large sample, but black box. They only ever see the finished answer, so they have to infer the machinery underneath from the output.<\/p> <p>This is the other way round. I read the network traffic, the JSON the engine sends to my own browser, and lift out the engine\u2019s own internal labels: the <code>result_source<\/code> it stamps on each result, the <code>turn_use_case<\/code> it files each query under, the vendor names, the search queries it wrote, the model it actually ran. I\u2019m not measuring how often something happens across a population. I\u2019m documenting that the machine has a thing, and what the machine calls it.<\/p> <p>That difference decides what you can trust here, so I am going to be blunt about it.<\/p> <h2>2 Confidence Levels, Do Not Mix Them Up<\/h2> <h3>Structural Facts (High Confidence)<\/h3> <p>That <code>result_source<\/code> exists and carries <code>serp<\/code>, <code>labrador<\/code>, <code>bright<\/code>, <code>oxylabs<\/code>. That <code>bright<\/code> is Bright Data and <code>oxylabs<\/code> is Oxylabs. That there are six <code>turn_use_case<\/code> values. That <code>text<\/code> queries skip the web entirely. That Thinking fires dozens of <code>site:<\/code> and price-verification sub-queries. These are read straight off the wire. One clean capture proves a field exists and what it is named, and a prompt case study, however enormous, cannot see any of it.<\/p> <h3>Frequency Observations (Directional Only)<\/h3> <p>Anything with a percentage or a ranking, \u201c70% bright,\u201d \u201cReddit is the most cited domain,\u201d \u201cYouTube never gets cited,\u201d comes from tens of queries on a single account, and my own query choice skews it. I picked SaaS and tech, which is exactly why Reddit and the tech review hubs lead here; a batch of health or fashion queries would crown different ones. Read these as the shape of the thing, not the measurement. Where a direction has a mechanical reason behind it (Reddit is text so it gets quoted, YouTube is video (metadata) so it does not), trust the direction and ignore the exact number.<\/p> <h2>First, The Boring Truth About \u2018Packet Analysis\u2019<\/h2> <p><strong>Skip this section if you don\u2019t want to get into nitty-gritty technical details.<\/strong><\/p> <p>My first instinct was wrong. You cannot sniff packets and read queries, because the payload is TLS-encrypted, so a capture hands you scrambled ciphertext for the actual messages. What the capture does leak is the metadata.<\/p> <p>The destination hostname, the IPs, and the fact that the ChatGPT app talks over QUIC (HTTP\/3), not plain TCP. That is why, in the screenshot below, Wireshark can still show \u201copenai\u201d in the handshake. It reads the unencrypted server name, not the conversation. QUIC obfuscates its first packet with fixed keys from the spec, so a tool can unwrap that opening packet to show the ClientHello.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 3078px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-25-13-35-47-b7afy-sej-435268.jpg\" width=\"3078\" height=\"1974\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>The real request and response bodies sit in later protected payloads that stay unreadable. So the readable layer is the browser itself, after decryption, in the Network panel.<\/p> <p>That\u2019s where the queries, the answers, and all the metadata live as JSON.<\/p> <p>This is HTTP inspection, not packet sniffing, and it\u2019s worth saying because half the people who try this start with Wireshark and give up. (I know I did lol.)<\/p> <p>Two things that did not work, so you do not repeat them.<\/p> <ol> <li>Driving a clean automated Chrome got me hard blocked by Cloudflare within a few queries on a different engine: the \u201cverifying you are human\u201d wall just loops forever in an automated browser, so I moved to my real Chrome with my real sessions.<\/li> <li>On ChatGPT, the answer never showed up in my capture at first, because it streams over a long-lived connection opened at page load that a hook installed mid-session cannot see. More on both later.<\/li> <\/ol> <h2>The Field That Labels Every Source<\/h2> <p>I opened DevTools, turned on Preserve log, ran a normal query, and searched the responses for anything that looked like a label.<\/p> <p>The field that came back was <code>result_source<\/code>. It sits on every web result ChatGPT pulls; you never see it in the answer, and it takes 1 of 4 values.<\/p> <p>Mark Williams-Cook shared that he had found three of these; I came across the fourth. I then saw Metehan\u2019s post, and it looks like he may have already found it too. But honestly, this is not really about who found what first. It is more about sharing what we are seeing, comparing notes, and learning from each other.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 5090px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-24-16-57-49-2zuj5-sej-530418.jpg\" width=\"5090\" height=\"1488\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe1\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe1\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>Here\u2019s one source from the traffic, trimmed to the fields that matter.<\/p> <pre><code>{&#13; \"attribution\": \"TechRadar\",&#13; \"url\": \" \"snippet\": \"...\",&#13; \"pub_date\": \"2026-05-09\",&#13; \"result_source\": \"labrador\"&#13; }<\/code><\/pre> <p>The four values it uses:<\/p> <div class=\"scrl-table\"> <table> <thead> <tr> <th><code>result_source<\/code><\/th> <th>What it is<\/th> <\/tr> <\/thead> <tbody> <tr> <td><code>serp<\/code><\/td> <td>The open web baseline, mostly seen on news (Yahoo, StreetInsider)<\/td> <\/tr> <tr> <td><code>labrador<\/code><\/td> <td>An allowlist of established publishers. Reuters, The Guardian, the WSJ, the FT, Wikipedia, even arXiv. Snippets run to ~1,080 characters, basically full-article extracts<\/td> <\/tr> <tr> <td><code>bright<\/code><\/td> <td>Bright Data, a commercial web scraper. Dominant for shopping, finance, weather, local.<\/td> <\/tr> <tr> <td><code>oxylabs<\/code><\/td> <td>Oxylabs, a rival scraper. Regional and local press, some open web<\/td> <\/tr> <\/tbody> <\/table> <\/div> <p><code>labrador<\/code> looks like a licensed tier, several of those publishers have signed content deals with OpenAI, and it isn\u2019t one you get into unless you own a national newspaper.<\/p> <p><code>bright<\/code> and <code>oxylabs<\/code> are the interesting pair. The names point at Bright Data and Oxylabs, two commercial scraping firms that happen to be direct rivals. I can\u2019t see a contract in the traffic, so I won\u2019t claim ChatGPT pays them, but its open web fetching runs through both, and the field tells you which one fetched each result. (We\u2019ve been Oxylabs customers for a long time for our SaaS Keyword Insights.)<\/p> <p>Across everything I logged, <code>bright<\/code> did the bulk of the fetching, especially on commercial, shopping, finance and weather queries. <code>oxylabs<\/code> skewed regional and local, <code>labrador<\/code> stayed on news and reference, and <code>serp<\/code> mostly turned up on news. To put names to the tiers, <code>labrador<\/code> carried Reuters, the WSJ, Wikipedia and TechRadar, <code>bright<\/code> pulled Reddit, Forbes and rtings, and <code>oxylabs<\/code> brought the Gulf press like Khaleej Times and Gulf News.<\/p> <p>I even caught the split inside one weather query, <code>bright<\/code> taking the global data sites like the Met Office while <code>oxylabs<\/code> handled the local Gulf press. (I live in Dubai, by the way.) In that one query, the breakdown came out like this.<\/p> <pre><code>Source Pipeline&#13; &#13; metoffice.gov.uk bright&#13; accuweather.com bright&#13; timeanddate.com bright&#13; khaleejtimes.com oxylabs&#13; gulfnews.com oxylabs&#13; whatson.ae oxylabs<\/code><\/pre> <h3>The AI SEO\/GEO Takeaway<\/h3> <p>You\u2019re mostly competing in the scraped tier, so be cleanly scrapable. Put your facts and numbers in plain HTML text, never behind a script or inside a PDF or an image. The licensed tier is mostly shut, so the lever you\u2019ve got is third-party coverage, PR, brand mentions, links, and Reddit, to land on the pages the scrapers actually reach.<\/p> <h2>The Queries That Never Reach The Web<\/h2> <p>The next thing I noticed was that some queries produced no network search whatsoever. Before ChatGPT searches, it files your question into a bucket, in a field called <code>turn_use_case<\/code>. I saw six of them across the questions I tried: instant search, shopping, text, local, thinking, and image generation.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 2340px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-24-18-38-02-6hfwo-sej-309522.jpg\" width=\"2340\" height=\"370\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe2\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe2\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>The one to care about is <code>text<\/code>. When ChatGPT files your question as <code>text<\/code>, it doesn\u2019t search. It answers from its training corpus and stops.<\/p> <p>The obvious cases end up here: \u201c<strong>how do I change a flat tyre<\/strong>\u201c, \u201c<strong>write a Python function to merge two sorted lists,<\/strong>\u201d and \u201c<strong>translate this into 4 languages<\/strong>\u201d all came back <code>text<\/code> with an empty network tab.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 5094px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-24-18-33-10-6hw4s-sej-821119.jpg\" width=\"5094\" height=\"2550\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe3\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe3\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>The one that should worry you is that \u201clatest treatment guidelines for type 2 diabetes\u201d also came back text, a current, high-stakes question you\u2019d assume it researches. It didn\u2019t; it answered from training. No E-E-A-T here. Oops!<\/p> <p>Of 10 deliberately current questions I tried, three were handled this way with no search at all.<\/p> <p>The wording decides the bucket, not the topic.<\/p> <p>\u201cbest coffee near me\u201d flips to the local pipeline, \u201cbest 4K TVs to buy\u201d turns on shopping, but \u201cbest 4K TVs with reviews\u201d stayed a normal search.<\/p> <p>A maths question quietly jumped to a reasoning model under thinking, while \u201cTesla stock price this week\u201d stayed instant search.<\/p> <p>Keep in mind, these are results from my limited testing. I will do more tests when I find some more time.<\/p> <h3>The AI SEO\/GEO Takeaway<\/h3> <p>Before you spend a penny on a page, check the query even searches. If it\u2019s a how-to or a definition, it may be answered from training, where no page can get in, however good it is. Spend your effort where it actually fetches.<\/p> <p>If you want to be mentioned for such queries, you\u2019d have to spend a lot of time building authority and wait for your brand to be included in future training data. (For example, make sure crawlers like Common Crawl can see your site.)<\/p> <h2>How One Question Fans Out Into Dozens Of Searches (Fan-Out Queries)<\/h2> <p>ChatGPT also exposes the searches it runs for you, if you pull the full conversation back from its own API. On the fast model, it\u2019s minimal: one reworded query and done, maybe optimized for speed over depth. On the thinking model, asked to compare a few products, it ran roughly 15 to 40 sub-queries off the single question. (The number depended on the complexity of the question.)<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 4684px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-24-19-00-41-7aqut-sej-504.jpg\" width=\"4684\" height=\"1482\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe4\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe4\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>Here\u2019s a slice of what it actually ran for one compare task.<\/p> <pre><code>\"Profound AI search visibility pricing AI engines tracked 2026\"&#13; \"AthenaHQ pricing AI search visibility tool\"&#13; \"site:peec.ai\/pricing Peec AI Starter Pro Advanced 50 prompts 150 prompts\"&#13; \"Peec AI pricing $95 $245 $495 official\" (a guessed price, then searched to confirm)&#13; \"Scrunch AI pricing\" (not in my prompt, found mid-research)&#13; ...around 40 of these for one comparison<\/code><\/pre> <p>Three things stand out in there. It fires <code>site:<\/code> probes straight at vendor pricing pages.<\/p> <p>It guesses a price and then searches to confirm it. And it keeps widening as it goes, picking up tools you never named and chasing their pricing, too.<\/p> <p>It doesn\u2019t only search either; the page-reading is just as literal. It ran <code>find<\/code> for <code>$<\/code>, <code>\u20ac<\/code>, <code>99<\/code> and even \u201cAgency,\u201d then used the browsing tool\u2019s own <code>open<\/code> and <code>click<\/code> commands to pull up the results it wanted, run server-side, not an agent on your screen.<\/p> <p>The same happens to your own site. Ask it \u201ckeyword insights pricing,\u201d and it runs a <code>site:keywordinsights.ai\/pricing<\/code> probe, guesses something like \u201cStarter $58, Pro $145, Advanced $299,\u201d then opens the page and reads the HTML for the currency symbol to confirm.<\/p> <h3>The AI SEO\/GEO Takeaway<\/h3> <p>Put your key numbers and data in plain HTML text, never inside an image, because in this case with pricing it greps the page for <code>$<\/code> and <code>\u20ac<\/code> and can\u2019t read a graphic. Also, you need to make sure you survive a <code>site:yourdomain.com\/pricing<\/code> probe in this use case and write for the cleaned-up query it actually runs, not the messy phrase a person types. Avoid JavaScript-based toggles and dynamic data loading.<\/p> <h2>Fetched, Cited, And Mentioned Aren\u2019t The Same<\/h2> <p>This is the distinction people muddle most, so it\u2019s worth being exact. Three different things can happen to a source.<\/p> <ul> <li><strong>Fetched.<\/strong> The model pulls your page into context. This is the <code>result_source<\/code> object. The reader never sees it.<\/li> <li><strong>Cited.<\/strong> It attaches your page as the source behind a specific sentence, the footnote you can click.<\/li> <li><strong>Mentioned.<\/strong> Your brand name appears in the answer, often as a chip linking to your site, but it isn\u2019t the source of the claim.<\/li> <\/ul> <p>They\u2019re three separate outcomes, and you can win or lose each one on its own.<\/p> <p>To see the gap between them, I took a batch of commercial and recommendation queries and split what ChatGPT fetched from what it cited.<\/p> <p>This is the small, tech-skewed sample, so read what follows as a pattern, not a number to bank on.<\/p> <p>Across that batch, Reddit and YouTube were both fetched heavily, 278 and 201 times. But Reddit was cited 11 times and YouTube not once.<\/p> <p>I think the reason is mechanical. A citation has to bind to text the model actually pulled, and when it fetches a YouTube page in search, it gets the metadata, not the actual video transcript.<\/p> <p>A Reddit thread is all there in the page. This isn\u2019t just my sample either. Ahrefs, across 1.4 million ChatGPT prompts, found Reddit cited at 1.93% against YouTube\u2019s 0.51%, and Profound found the same gap.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 1268px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-25-11-35-45-6uvy8-sej-759193.jpg\" width=\"1268\" height=\"238\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe5\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe5\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>A few other patterns, same caveat on sample size. Reddit was the single most-cited domain, narrowly, and after that no one ran away with it. The citations spread thin across review hubs like rtings and TechRadar and vendor pages cited for their own specs.<\/p> <p>Here\u2019s the top of the cited list across that batch.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 1288px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-25-11-36-39-6v8p5-sej-489900.jpg\" width=\"1288\" height=\"636\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe6\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe6\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>Vendor pages get cited too, but for their own facts, the pricing and specs. Zoho, Semrush, and the VPNs earned citations that way. The verdict on which one is best still gets cited to a third party. You can be mentioned without being cited, and cited without being mentioned.<\/p> <p>Two mechanics sit underneath this. Citations bind to a specific sentence, not the whole answer, so being topically relevant isn\u2019t <em>enough<\/em>; you have to be the best support for a precise claim.<\/p> <p>And results are deduped by domain, so 20 thin pages from your site collapse into one.<\/p> <p>One strong page per claim beats a pile of weak ones.<\/p> <p>So, don\u2019t go around creating thousands of low quality\/thin pages to address each fanout query.<\/p> <h3>The AI SEO\/GEO Takeaway<\/h3> <p>You can\u2019t cite yourself. The claim about you gets sourced from someone else, so earn third-party coverage on review sites and Reddit, win on text rather than video, and put one strong page behind each claim, because it dedupes by domain.<\/p> <h2>The Model Explains Its Own Strategy<\/h2> <p>I went looking for a hidden ranking score first and found nothing. That kind of logic \u2013 a domain authority number, a trust weight, a formula \u2013 never reaches your browser, because it stays on OpenAI\u2019s servers.<\/p> <p>So, anyone selling you \u201cChatGPT\u2019s ranking factors\u201d is selling you snake oil.<\/p> <p>What the traffic does have is the thinking model\u2019s chain of thought, saved in the conversation, where it describes its own sourcing in plain words.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 1346px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-25-11-51-41-7ox89-sej-706443.jpg\" width=\"1346\" height=\"2408\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe7\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe7\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>For facts, the pricing and the specs, it goes to the official page first, and it says so.<\/p> <p>Comparing Ahrefs, it reads the official page, notes it \u201clists Lite at $129, Standard at $249, and Advanced at $449,\u201d and decides \u201cpricing page seems more current, so I should cite that.\u201d It wants the source it trusts, and the current one.<\/p> <p>Then it hits the wall this whole post is about.<\/p> <p>On Profound, it reasons that \u201cthe pricing isn\u2019t showing up directly in the search result, possibly because it\u2019s loaded with JavaScript.\u201d Same on Peec, where \u201cthe pricing doesn\u2019t show up directly, possibly hidden with JavaScript.\u201d<\/p> <p>So, it stops trying to read them and falls back. \u201cI can quote third-party sources since the official page is hard to parse and doesn\u2019t show prices\u201d, it writes, and it notes it should \u201cuse citations from G2 where appropriate.\u201d<\/p> <p>That\u2019s the whole game in one trace. The model wanted Profound\u2019s and Peec\u2019s own numbers. Their pricing sat behind JavaScript, so it couldn\u2019t read them, and it cited G2 instead. Your facts, someone else\u2019s page, because yours wouldn\u2019t parse.<\/p> <p>Those quotes are the model\u2019s own, from the saved reasoning, not mine.<\/p> <h3>The AI SEO\/GEO Takeaway<\/h3> <p>Own your facts, in plain HTML. Your pricing and spec numbers have to sit in crawlable text, not loaded by JavaScript and not baked into an image, because the model reads the page itself and gives up when it can\u2019t. A JavaScript pricing table doesn\u2019t just rank badly; it hands your numbers to G2.<\/p> <p>The opinion you earn separately, through reviews, Reddit, and honest comparison content, which is where the recommendation gets cited from. A clean, readable pricing page with no third-party coverage gets your facts read and someone else recommended.<\/p> <h2>What I Could Not See<\/h2> <p>There\u2019s no visible ranking logic, as above, so why one source beats another, past the model\u2019s own narration, stays server-side.<\/p> <p>Personalization is real and selective.<\/p> <p>On a query that overlapped my own work, ChatGPT pulled in my past conversations, with the sources listed as <code>personal_sources: [\"convo_search\", \"gmail\", \"files\"]<\/code>.<\/p> <p>It used one of my old chats inside a generic \u201cbest tools\u201d answer, but only on one of the three conversations I checked, the one that matched my history.<\/p> <p>So, part of some answers is built from a user\u2019s private data you can never optimize for, which is one reason two people get different answers and visibility scores wobble.<\/p> <p>Local is capped. There\u2019s a config value, <code>local_results_limit<\/code>, set to 2.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 3966px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/screendrop-2026-06-25-12-15-24-89tru-sej-695863.jpg\" width=\"3966\" height=\"2026\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe8\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe8\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>Ask for the best coffee near you, and ChatGPT returns two places, not a top 10. For local, you\u2019re in the top 2, or you aren\u2019t there.<\/p> <p>One thing I genuinely can\u2019t call yet. My read on shopping comes from a single shopping query, and it flatly contradicts what Mark saw on his single query, so the shopping mix is unsettled until someone runs a proper batch.<\/p> <p>And the wider caveat, said plainly. The structure I\u2019m sure of, because I saw it across roughly 1,240 records. The percentages come from a small batch of commercial queries, mostly SaaS and tech, so they need a bigger run across real verticals before anyone banks on them.<\/p> <p>That run is the next piece.<\/p> <h2>Run It Yourself<\/h2> <p>None of this needs special access or requires you to be connected to the Matrix and become an operator, just your own browser.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 623px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/images-7-8z4s4-sej-869014.jpg\" width=\"623\" height=\"321\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe9\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe9\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>Open ChatGPT, press <em>Cmd+Option+I<\/em> for DevTools, open Network, tick Preserve log, run a query, then press <em>Cmd+Option+F<\/em> and search the responses for <code>result_source<\/code>.<\/p> <p>That alone shows you the pipeline behind each link.<\/p> <p>For the rest, the fan-out and the citations and the reasoning, open the Console, type <code>allow pasting<\/code> once, and run this against a conversation that searched the web.<\/p> <pre><code>const t = (await (await fetch('\/api\/auth\/session')).json()).accessToken;&#13; const c = await (await fetch('\/backend-api\/conversation\/' + location.pathname.split('\/c\/')[1], {headers: {Authorization: 'Bearer ' + t}})).json();&#13; const rows = [];&#13; JSON.stringify(c, (k, v) =&gt; {&#13; if (v &amp;&amp; v.result_source) {&#13; const d = (v.attribution || v.url || '?').toString();&#13; rows.push({source: d.replace(' '').replace('www.', '').split('\/')[0], pipeline: v.result_source});&#13; }&#13; return v;&#13; });&#13; console.table(rows);<\/code><\/pre> <p>It reads only your own session, so nothing leaves your machine. The output is a plain table of each source and the pipeline that fetched it.<\/p> <pre><code>source pipeline&#13; techradar.com labrador&#13; whathifi.com labrador&#13; soundguys.com bright&#13; rtings.com bright&#13; khaleejtimes.com oxylabs&#13; streetinsider.com serp<\/code><\/pre> <p>Change what the loop collects, and you can pull the searches, the citations, and the reasoning the same way.<\/p> <h2>A Free Extension Now Captures Most Of This<\/h2> <p>If pasting scripts into your own console isn\u2019t your thing, there\u2019s now an easier route. Olivier de Segonzac already ran a free Chrome extension that pulls ChatGPT\u2019s search and fan-out data.<\/p> <p>He read this research and extended it to capture three of the signals I took apart above.<\/p> <ul> <li><strong>The <code>turn_use_case<\/code> bucket.<\/strong> The intent label ChatGPT files each turn under, so you can spot when a query flips to shopping, local, or <code>text<\/code> before it even answers.<\/li> <li><strong>The reference-type mix.<\/strong> How many of the answer\u2019s citations were products versus search results, news, or images, parsed straight from the reference tokens.<\/li> <li><strong>The <code>result_source<\/code> pipeline.<\/strong> The scraper behind each cited result, charted per conversation, so the Bright Data, Oxylabs, Labrador, and SERP split shows up without you reading a line of JSON.<\/li> <\/ul> <p>It runs locally on your own session and exports straight to Excel. Grab it from the Chrome Web Store, and Olivier wrote up the update here.<\/p> <figure class=\"wp-caption aligncenter\" style=\"width: 1048px\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2026\/06\/1782486822994-e95q2-sej-637569.jpg\" width=\"1048\" height=\"786\"  class=\"\" loading=\"lazy\" title=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe10\" alt=\"How ChatGPT Actually Picks Sources (I Read The Network Traffic, Not The Outputs) via @sejournal, @suganthan\u63d2\u56fe10\" \/><figcaption class=\"wp-caption-text\">Image Credit: Suganthan Mohanadasan<\/figcaption><\/figure> <p>So, back to the question we opened with. Does the usual advice hold up? Mostly. Reddit earns citations and topped my cited list. Listicles and review sites make up most of the rest. Good content still matters, but only the half the model can actually read. The rest it reads off someone else\u2019s page.<\/p> <p>Which is the real lesson. ChatGPT isn\u2019t a search engine, so stop optimizing for one.<\/p> <p>It reads your own page for the facts, if it can parse them, and everyone else\u2019s for the opinion, and only when the question is worth a search. Build for that.<\/p> <p>And treat all of this, mine included, as a snapshot of a system that changes by the week. The structure holds. The numbers move.<\/p> <p>While I was in the traffic, I also found a pile of things with nothing to do with sourcing: the bot wall that stops you scripting it, a hidden shopping engine, and 573 live experiments running on the account. Those will be published separately.<\/p> <p>I\u2019ve also done similar analysis on Perplexity, Gemini, etc., so I\u2019ll be sharing those soon.<\/p> <p><strong>More Resources:<\/strong><\/p> <hr\/> <p><em>This post was originally published on Suganthan.<\/em><\/p> <hr\/> <p><em>Featured Image: Viktoriia_M\/Shutterstock<\/em><\/p> <\/div> <p>Generative AI,SEO,Technical SEO#ChatGPT #Picks #Sources #Read #Network #Traffic #Outputs #sejournal #suganthan1782828269<\/p> ","protected":false},"excerpt":{"rendered":"<p>I keep getting the same question from clients and SEOs (GEOs?). \u201cHow do we show up in ChatGPT?\u201d The answer is always the same. Write good content, do listicles, comment on Reddit. The usual. But, how do we actually know any of that works? Most of it gets repeated on faith, one expert quoting the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":10665,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[94,8169,40437,3360,350,80,7380,42538,441],"class_list":["post-10664","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-accessibility","tag-chatgpt","tag-network","tag-outputs","tag-picks","tag-read","tag-sejournal","tag-sources","tag-suganthan","tag-traffic"],"acf":[],"_links":{"self":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/10664","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10664"}],"version-history":[{"count":0,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/10664\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/media\/10665"}],"wp:attachment":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10664"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10664"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10664"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}