{"id":5828,"date":"2026-04-04T12:03:01","date_gmt":"2026-04-04T04:03:01","guid":{"rendered":"http:\/\/longzhuplatform.com\/?p=5828"},"modified":"2026-04-04T12:03:01","modified_gmt":"2026-04-04T04:03:01","slug":"google-explains-how-crawling-works-in-2026","status":"publish","type":"post","link":"http:\/\/longzhuplatform.com\/?p=5828","title":{"rendered":"Google explains how crawling works in 2026"},"content":{"rendered":"<p><\/p> <div id=\"articleContent\" itemprop=\"articlebody\"> <div class=\"bialty-container\"> <p>Gary Illyes from Google shared some more details on Googlebot, Google\u2019s crawling ecosystem, fetching and how it processes bytes. <\/p> <p>The article is named Inside Googlebot: demystifying crawling, fetching, and the bytes we process.<\/p> <p><strong>Googlebot. <\/strong>Google has many more than one singular crawler, it has many crawlers for many purposes. So referencing Googlebot as a singular crawler, might not be super accurate anymore. Google documented many of its crawlers and user agents over here.<\/p> <p><strong>Limits. <\/strong>Recently, Google spoke about its crawling limits. Now, Gary Illyes dug into it more. He said:<\/p> <ul class=\"wp-block-list\"> <li>Googlebot currently fetches up to 2MB for any individual URL (excluding PDFs). <\/li> <li>This means it crawls only the first 2MB of a resource, including the HTTP header. <\/li> <li>For PDF files, the limit is 64MB.<\/li> <li>Image and video crawlers typically have a wide range of threshold values, and it largely depends on the product that they\u2019re fetching for.<\/li> <li>For any other crawlers that don\u2019t specify a limit, the default is 15MB regardless of content type.<\/li> <\/ul> <p>Then what happens when Google crawls?<\/p> <ol class=\"wp-block-list\"> <li><strong>Partial fetching:<\/strong>\u00a0If your HTML file is larger than 2MB, Googlebot doesn\u2019t reject the page. Instead, it stops the fetch exactly at the 2MB cutoff. Note that the limit includes HTTP request headers.<\/li> <li><strong>Processing the cutoff:<\/strong>\u00a0That downloaded portion (the first 2MB of bytes) is passed along to our indexing systems and the Web Rendering Service (WRS) as if it were the complete file.<\/li> <li><strong>The unseen bytes:<\/strong>\u00a0Any bytes that exist\u00a0<em>after<\/em>\u00a0that 2MB threshold are entirely ignored. They aren\u2019t fetched, they aren\u2019t rendered, and they aren\u2019t indexed.<\/li> <li><strong>Bringing in resources:<\/strong>\u00a0Every referenced resource in the HTML (excluding media, fonts, and a few exotic files) will be fetched by WRS with Googlebot like the parent HTML. They have their own, separate, per-URL byte counter and don\u2019t count towards the size of the parent page.<\/li> <\/ol> <p><strong>How Google renders these bytes. <\/strong>When the crawler accesses these bytes, it then passes it over to WRS, the web rendering service. \u201cThe WRS processes JavaScript and executes client-side code similar to a modern browser to understand the final visual and textual state of the page. Rendering pulls in and executes JavaScript and CSS files, and processes XHR requests to better understand the page\u2019s textual content and structure (it doesn\u2019t request images or videos). For each requested resource, the 2MB limit also applies,\u201d Google explained.<\/p> <p><strong>Best practices. <\/strong>Google listed these best practices:<\/p> <ul class=\"wp-block-list\"> <li><strong>Keep your HTML lean:<\/strong>\u00a0Move heavy CSS and JavaScript to external files. While the initial HTML document is capped at 2MB, external scripts, and stylesheets are fetched separately (subject to their own limits).<\/li> <li><strong>Order matters:<\/strong>\u00a0Place your most critical elements \u2014 like meta tags,\u00a0<code><title\/><\/code>\u00a0elements,\u00a0<code><link\/><\/code>\u00a0elements, canonicals, and essential structured data \u2014 higher up in the HTML document. This ensures they are unlikely to be found below the cutoff.<\/li> <li><strong>Monitor your server logs:<\/strong>\u00a0Keep an eye on your server response times. If your server is struggling to serve bytes, our fetchers will automatically back off to avoid overloading your infrastructure, which will drop your crawl frequency.<\/li> <\/ul> <p><strong>Podcast. <\/strong>Google also had a podcast on the topic, here it is:<\/p> <figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"> <div class=\"wp-block-embed__wrapper\"> <noscript><iframe loading=\"lazy\" title=\"Google crawlers behind the scenes\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/JpweMBnpS4Q?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/noscript> <\/div> <\/figure> <\/div> <hr\/> <p class=\"article-disclosure\"> <em>Search Engine Land is owned by Semrush. We remain committed to providing high-quality coverage of marketing topics. Unless otherwise noted, this page\u2019s content was written by either an employee or a paid contractor of Semrush Inc.<\/em> <\/p> <hr\/> <div class=\"author-about py-4\"> <div class=\"card bg-light\"> <div class=\"row gx-0\"> <div class=\"col-12 col-lg-auto\"> <div class=\"authorImage p-2\"> <img loading=\"lazy\" decoding=\"async\" class=\"img-fluid rounded avatar-border\" src=\"https:\/\/searchengineland.com\/wp-content\/seloads\/2025\/05\/1630496745425.jpeg.webp\" alt=\"Barry Schwartz\" width=\"140\" height=\"140\" title=\"Google explains how crawling works in 2026\u63d2\u56fe\" \/> <\/div> <\/p><\/div> <div class=\"col-12 col-lg\"> <div class=\"card-body author-body p-2\"> <div id=\"authorBio-251\" class=\"author-desc\"> <p>Barry Schwartz is a technologist and a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics.<\/p> <p>In 2019, Barry was awarded the Outstanding Community Services Award from Search Engine Land, in 2018 he was awarded the US Search Awards the &#8220;US Search Personality Of The Year,&#8221; you can learn more over here and in 2023 he was listed as a top 50 most influential PPCer by Marketing O&#8217;Clock.<\/p> <p>Barry can be followed\u00a0<a href=\"https:\/\/twitter.com\/rustybrick\/\">on X here<\/a>\u00a0and you can learn more about\u00a0Barry Schwartz over here or on his personal site.<\/p> <\/p><\/div> <\/p><\/div> <\/p><\/div> <\/p><\/div> <\/p><\/div> <\/div> <p> <!-- START SIDEBAR LOWER SPACE --><\/p> <p><!-- END SIDEBAR LOWER SPACE --><\/p><\/div> <p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>News#Google #explains #crawling #works1775275381<\/p> ","protected":false},"excerpt":{"rendered":"<p>Gary Illyes from Google shared some more details on Googlebot, Google\u2019s crawling ecosystem, fetching and how it processes bytes. The article is named Inside Googlebot: demystifying crawling, fetching, and the bytes we process. Googlebot. Google has many more than one singular crawler, it has many crawlers for many purposes. So referencing Googlebot as a singular [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5829,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[4675,211,75,83,359],"class_list":["post-5828","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-careers","tag-crawling","tag-explains","tag-google","tag-news","tag-works"],"acf":[],"_links":{"self":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/5828","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5828"}],"version-history":[{"count":0,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/5828\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/media\/5829"}],"wp:attachment":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5828"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5828"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5828"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}