{"id":1852,"date":"2026-01-21T08:26:48","date_gmt":"2026-01-21T00:26:48","guid":{"rendered":"http:\/\/longzhuplatform.com\/?p=1852"},"modified":"2026-01-21T08:26:48","modified_gmt":"2026-01-21T00:26:48","slug":"openai-search-crawler-passes-55-coverage-in-hostinger-study-via-sejournal-mattgsouthern-2","status":"publish","type":"post","link":"http:\/\/longzhuplatform.com\/?p=1852","title":{"rendered":"OpenAI Search Crawler Passes 55% Coverage In Hostinger Study via @sejournal, @MattGSouthern"},"content":{"rendered":"<p><\/p> <div id=\"narrow-cont\"> <p>Hostinger analyzed 66 billion bot requests across more than 5 million websites and found that AI crawlers are following two different paths.<\/p> <p>LLM training bots are losing access to the web as more sites block them. Meanwhile, AI assistant bots that power search tools like ChatGPT are expanding their reach.<\/p> <p>The analysis draws on anonymized server logs from three 6-day windows, with bot classification mapped to AI.txt project classifications.<\/p> <h3>Training Bots Are Getting Blocked<\/h3> <p>The starkest finding involves OpenAI\u2019s GPTBot, which collects data for model training. Its website coverage dropped from 84% to 12% over the study period.<\/p> <p>Meta\u2019s ExternalAgent was the largest training-category crawler by request volume in Hostinger\u2019s data. Hostinger says this training-bot group shows the strongest declines overall, driven in part by sites blocking AI training crawlers.<\/p> <p>These numbers align with patterns I\u2019ve tracked through multiple studies. BuzzStream found that 79% of top news publishers now block at least one training bot. Cloudflare\u2019s Year in Review showed GPTBot, ClaudeBot, and CCBot had the highest number of full disallow directives across top domains.<\/p> <p>The data quantifies what those studies suggested. Hostinger interprets the drop in training-bot coverage as a sign that more sites are blocking those crawlers, even when request volumes remain high.<\/p> <h3>Assistant Bots Tell a Different Story<\/h3> <p>While training bots face resistance, the bots that power AI search tools are expanding access.<\/p> <p>OpenAI\u2019s OAI-SearchBot, which fetches content for ChatGPT\u2019s search feature, reached 55.67% average coverage. TikTok\u2019s bot grew to 25.67% coverage with 1.4 billion requests. Apple\u2019s bot reached 24.33% coverage.<\/p> <p>These assistant crawls are user-triggered and more targeted. They serve users directly rather than collecting training data, which may explain why sites treat them differently.<\/p> <h3>Classic Search Remains Stable<\/h3> <p>Traditional search engine crawlers held steady throughout the study. Googlebot maintained 72% average coverage with 14.7 billion requests. Bingbot stayed at 57.67% coverage.<\/p> <p>The stability contrasts with changes in the AI category. Google\u2019s main crawler faces a unique position since blocking it affects search visibility.<\/p> <h3>SEO Tools Show Decline<\/h3> <p>SEO and marketing crawlers saw declining coverage. Ahrefs maintained the largest footprint at 60% coverage, but the category overall shrank. Hostinger attributes this to two factors. These tools increasingly focus on sites actively doing SEO work. And website owners are blocking resource-intensive crawlers.<\/p> <p>I reported on the resource concerns\u00a0when Vercel data showed GPTBot generating 569 million requests in a single month. For some publishers, the bandwidth costs became a business problem.<\/p> <h3>Why This Matters<\/h3> <p>The data confirms a pattern that\u2019s been building over the past year. Site operators are drawing a line between AI crawlers they\u2019ll allow and those they won\u2019t.<\/p> <p>The decision comes down to function. Training bots collect content to improve models without sending traffic back. Assistant bots fetch content to answer specific user questions, which means they can surface your content in AI search results.<\/p> <p>Hostinger suggests a middle path: block training bots while allowing assistant bots that drive discovery. This lets you participate in AI search without contributing to model training.<\/p> <h3>Looking Ahead<\/h3> <p>OpenAI recommends allowing OAI-SearchBot if you want your site to appear in ChatGPT search results, even if you block GPTBot.<\/p> <p>OpenAI\u2019s documentation clarifies the difference. OAI-SearchBot controls inclusion in ChatGPT search results and respects robots.txt. ChatGPT-User handles user-initiated browsing and may not be governed by robots.txt in the same way.<\/p> <p>Hostinger recommends checking server logs to see what\u2019s actually hitting your site, then making blocking decisions based on your goals. If you\u2019re concerned about server load, you can use CDN-level blocking. If you want to potentially\u00a0<span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">increase your AI visibility, review\u00a0current AI crawler user agents and allow only the specific bots that support your<\/span>\u00a0strategy.<\/p> <hr\/> <p><em>Featured Image: BestForBest\/Shutterstock<\/em><\/p> <\/div> <p>#OpenAI #Search #Crawler #Passes #Coverage #Hostinger #Study #sejournal #MattGSouthern1768955208<\/p> ","protected":false},"excerpt":{"rendered":"<p>Hostinger analyzed 66 billion bot requests across more than 5 million websites and found that AI crawlers are following two different paths. LLM training bots are losing access to the web as more sites block them. Meanwhile, AI assistant bots that power search tools like ChatGPT are expanding their reach. The analysis draws on anonymized [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1843,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[4337,4335,4338,90,216,4336,95,80,223],"class_list":["post-1852","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-careers","tag-coverage","tag-crawler","tag-hostinger","tag-mattgsouthern","tag-openai","tag-passes","tag-search","tag-sejournal","tag-study"],"acf":[],"_links":{"self":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/1852","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1852"}],"version-history":[{"count":0,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/1852\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/media\/1843"}],"wp:attachment":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1852"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1852"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1852"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}