
We have known for a long time that Google can crawl web pages up to the first 15MB but now Google updated some of its help documentation to clarify that it will crawl the first 64MB of a PDF file and the first 2MB of other supported file types.
The 64MB and 2MB items might not be new, but I don’t think I covered those before. I know I covered the Google will crawl up to 2MB of your disavow file but no other mentions of 2MB is in my coverage.
This help document was updated to now read:
When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file. From a rendering perspective, each resource referenced in the HTML (such as CSS and JavaScript) is fetched separately, and each resource fetch is bound by the same file size limit that applies to other files (except PDF files).
Once the cutoff limit is reached, Googlebot stops the fetch and only sends the already downloaded part of the file for indexing consideration. The file size limit is applied on the uncompressed data. Other Google crawlers, for example Googlebot Video and Googlebot Image, may have different limits.
Then Google also updated this document to add the 15MB limit, but that was not new – it now says:
By default, Google’s crawlers and fetchers only crawl the first 15MB of a file. Any content beyond this limit is ignored. Individual projects may set different limits for their crawlers and fetchers, and also for different file types. For example, a Google crawler may set a larger file size limit for a PDF than for HTML.
Google explained, that “While moving over the information about the default file size limits of Google’s crawlers and fetchers to the crawler documentation, we also updated the Googlebot documentation about its own file size limits.” “The original location of the default file size limits was not the most logical place as it applies to all of Google’s crawlers and fetchers, and the move enabled us to be more precise about Googlebot’s limits,” Google added.
The more precise details are useful to know.
There is some confusion around the 15MB for HTML files or 2MB files for HTML files and I asked John Mueller who replied on Bluesky saying, “In short (gotta run), Googlebot is one of Google’s crawlers, but not all of them.”
Forum discussion at X.
#Googlebot #File #Limit #15MB #64MB #PDF #ampamp #2MB #File #Types1770226011












