{"id":1782,"date":"2026-01-20T05:06:17","date_gmt":"2026-01-19T21:06:17","guid":{"rendered":"http:\/\/longzhuplatform.com\/?p=1782"},"modified":"2026-01-20T05:06:17","modified_gmt":"2026-01-19T21:06:17","slug":"inside-searchguard-how-google-detects-bots-and-what-the-serpapi-lawsuit-reveals","status":"publish","type":"post","link":"http:\/\/longzhuplatform.com\/?p=1782","title":{"rendered":"Inside SearchGuard: How Google detects bots and what the SerpAPI lawsuit reveals"},"content":{"rendered":"<p><\/p> <div> <p>We fully decrypted Google\u2019s SearchGuard anti-bot system, the technology at the center of its recent lawsuit against SerpAPI.<\/p> <p>After fully deobfuscating the JavaScript code, we now have an unprecedented look at how Google distinguishes human visitors from automated scrapers in real time.<\/p> <p><strong>What happened.<\/strong> Google filed a lawsuit on Dec. 19 against Texas-based SerpAPI LLC, alleging the company circumvented SearchGuard to scrape copyrighted content from Google Search results at a scale of \u201chundreds of millions\u201d of queries daily. Rather than targeting terms-of-service violations, Google built its case on DMCA Section 1201 \u2013 the anti-circumvention provision of copyright law.<\/p> <p>The complaint describes SearchGuard as \u201cthe product of tens of thousands of person hours and millions of dollars of investment.\u201d<\/p> <p><strong>Why we care.<\/strong> The lawsuit reveals exactly what Google considers worth protecting \u2013 and how far it will go to defend it. For SEOs and marketers, understanding SearchGuard matters because any large-scale automated interaction with Google Search now triggers this system. If you\u2019re using tools that scrape SERPs, this is the wall they\u2019re hitting.<\/p> <h2 id=\"the-openai-connection\" class=\"wp-block-heading\">The OpenAI connection<\/h2> <p>Here\u2019s where it gets interesting: SerpAPI isn\u2019t just any scraping company.<\/p> <p>OpenAI has been partially using Google search results scraped by SerpAPI to power ChatGPT\u2019s real-time answers. SerpAPI listed OpenAI as a customer on its website as recently as May 2024, before the reference was quietly removed.<\/p> <p>Google declined OpenAI\u2019s direct request to access its search index in 2024. Yet ChatGPT still needed fresh search data to compete. <\/p> <p>The solution? A third-party scraper that pillages Google\u2019s SERPs and resells the data.<\/p> <p>Google isn\u2019t attacking OpenAI directly. It\u2019s targeting a key link in the supply chain that feeds its main AI competitor.<\/p> <p>The timing is telling. Google is striking at the infrastructure that powers rival search products \u2014 without naming them in the complaint.<\/p> <h2 id=\"what-we-found-inside-searchguard\" class=\"wp-block-heading\">What we found inside SearchGuard<\/h2> <p>We fully decrypted version 41 of the BotGuard script \u2013 the technology underlying SearchGuard. The script opens with an unexpectedly friendly message:<\/p> <pre class=\"wp-block-code\"><code>Anti-spam. Want to say hello? Contact [email\u00a0protected] *\/<\/code><\/pre> <p>Behind that greeting sits one of the most sophisticated bot detection systems ever deployed.<\/p> <p><strong>BotGuard vs. SearchGuard.<\/strong> BotGuard is Google\u2019s proprietary anti-bot system, internally called \u201cWeb Application Attestation\u201d (WAA). Introduced around 2013, it now protects virtually all Google services: YouTube, reCAPTCHA v3, Google Maps, and more.<\/p> <p>In its complaint against SerpAPI, Google revealed that the system protecting Search specifically is called \u201cSearchGuard\u201d \u2013 presumably the internal name for BotGuard when applied to Google Search. This is the component that was deployed in January 2025, breaking nearly every SERP scraper overnight.<\/p> <p>Unlike traditional CAPTCHAs that require clicking images of traffic lights, BotGuard operates completely invisibly. It continuously collects behavioral signals and analyzes them using statistical algorithms to distinguish humans from bots \u2013 all without the user knowing.<\/p> <p>The code runs inside a bytecode virtual machine with 512 registers, specifically designed to resist reverse engineering.<\/p> <h2 id=\"how-google-knows-youre-human\" class=\"wp-block-heading\">How Google knows you\u2019re human<\/h2> <p>The system tracks four categories of behavior in real time. Here\u2019s what it measures:<\/p> <h3 class=\"wp-block-heading\" id=\"h-mouse-movements\">Mouse movements<\/h3> <p>Humans don\u2019t move cursors in straight lines. We follow natural curves with acceleration and deceleration \u2013 tiny imperfections that reveal our humanity.<\/p> <p>Google tracks:<\/p> <ul class=\"wp-block-list\"> <li>Trajectory (path shape)<\/li> <li>Velocity (speed)<\/li> <li>Acceleration (speed changes)<\/li> <li>Jitter (micro-tremors)<\/li> <\/ul> <p>A \u201cperfect\u201d mouse movement \u2013 linear, constant speed \u2013 is immediately suspicious. Bots typically move in precise vectors or teleport between points. Humans are messier.<\/p> <p><strong>Detection threshold:<\/strong> Mouse velocity variance below 10 flags as bot behavior. Normal human variance falls between 50-500.<\/p> <h3 class=\"wp-block-heading\" id=\"h-keyboard-rhythm\">Keyboard rhythm<\/h3> <p>Everyone has a unique typing signature. Google measures:<\/p> <ul class=\"wp-block-list\"> <li>Inter-key intervals (time between keystrokes)<\/li> <li>Key press duration (how long each key is held)<\/li> <li>Error patterns<\/li> <li>Pauses after punctuation<\/li> <\/ul> <p>A human typically shows 80-150ms variance between keystrokes. A bot? Often less than 10ms with robotic consistency.<\/p> <p><strong>Detection threshold:<\/strong> Key press duration variance under 5ms indicates automation. Normal human typing shows 20-50ms variance.<\/p> <h3 class=\"wp-block-heading\" id=\"h-scroll-behavior\">Scroll behavior<\/h3> <p>Natural scrolling has variable velocity, direction changes, and momentum-based deceleration. Programmatic scrolling is often too smooth, too fast, or perfectly uniform.<\/p> <p>Google measures:<\/p> <ul class=\"wp-block-list\"> <li>Amplitude (how far)<\/li> <li>Direction changes<\/li> <li>Timing between scrolls<\/li> <li>Smoothness patterns<\/li> <\/ul> <p>Scrolling in fixed increments \u2013 100px, 100px, 100px \u2013 is a red flag.<\/p> <p><strong>Detection threshold:<\/strong> Scroll delta variance under 5px suggests bot activity. Humans typically show 20-100px variance.<\/p> <h3 class=\"wp-block-heading\" id=\"h-timing-jitter\">Timing jitter<\/h3> <p>This is the killer signal. Humans are inconsistent, and that\u2019s exactly what makes us human.<\/p> <p>Google uses Welford\u2019s algorithm to calculate variance in real-time with constant memory usage \u2013 meaning it can analyze patterns without storing massive amounts of data, regardless of how many events occur. As each event arrives, the algorithm updates its running statistics.<\/p> <p>If your action intervals have near-zero variance, you\u2019re flagged.<\/p> <p><strong>The math:<\/strong> If timing follows a Gaussian distribution with natural variance, you\u2019re human. If it\u2019s uniform or deterministic, you\u2019re a bot.<\/p> <p><strong>Detection threshold:<\/strong> Event counts exceeding 200 per second indicate automation. Normal human interaction generates 10-50 events per second.<\/p> <h2 id=\"the-100-dom-elements-google-monitors\" class=\"wp-block-heading\">The 100+ DOM elements Google monitors<\/h2> <p>Beyond behavior, SearchGuard fingerprints your browser environment by monitoring over 100 HTML elements. The complete list extracted from the source code includes:<\/p> <ul class=\"wp-block-list\"> <li><strong>High-priority elements (forms):<\/strong> BUTTON, INPUT \u2013 these receive special attention because bots often target interactive elements.<\/li> <li><strong>Structure:<\/strong> ARTICLE, SECTION, NAV, ASIDE, HEADER, FOOTER, MAIN, DIV<\/li> <li><strong>Text:<\/strong> P, PRE, BLOCKQUOTE, EM, STRONG, CODE, SPAN, and 25 others<\/li> <li><strong>Tables:<\/strong> TABLE, CAPTION, TBODY, THEAD, TR, TD, TH<\/li> <li><strong>Media:<\/strong> FIGURE, CANVAS, PICTURE<\/li> <li><strong>Interactive:<\/strong> DETAILS, SUMMARY, MENU, DIALOG<\/li> <\/ul> <h2 id=\"environmental-fingerprinting\" class=\"wp-block-heading\">Environmental fingerprinting<\/h2> <p>SearchGuard also collects extensive browser and device data:<\/p> <p><strong>Navigator properties:<\/strong><\/p> <ul class=\"wp-block-list\"> <li>userAgent<\/li> <li>language \/ languages<\/li> <li>platform<\/li> <li>hardwareConcurrency (CPU cores)<\/li> <li>deviceMemory<\/li> <li>maxTouchPoints<\/li> <\/ul> <p><strong>Screen properties:<\/strong><\/p> <ul class=\"wp-block-list\"> <li>width \/ height<\/li> <li>colorDepth \/ pixelDepth<\/li> <li>devicePixelRatio<\/li> <\/ul> <p><strong>Performance:<\/strong><\/p> <ul class=\"wp-block-list\"> <li>performance.now() precision<\/li> <li>performance.timeOrigin<\/li> <li>Timer jitter (fluctuations in timing APIs)<\/li> <\/ul> <p><strong>Visibility:<\/strong><\/p> <ul class=\"wp-block-list\"> <li>document.hidden<\/li> <li>visibilityState<\/li> <li>hasFocus()<\/li> <\/ul> <p><strong>WebDriver detection:<\/strong> The script specifically checks for signatures that betray automation tools:<\/p> <ul class=\"wp-block-list\"> <li><code>navigator.webdriver<\/code> (true if automated)<\/li> <li><code>window.chrome.runtime<\/code> (absent in headless mode)<\/li> <li>ChromeDriver signatures ($cdc_ prefixes)<\/li> <li>Puppeteer markers (<code>$chrome_asyncScriptInfo<\/code>)<\/li> <li>Selenium indicators (<code>__selenium_unwrapped<\/code>)<\/li> <li>PhantomJS artifacts (<code>_phantom<\/code>)<\/li> <\/ul> <h2 id=\"why-bypasses-become-obsolete-in-minutes\" class=\"wp-block-heading\">Why bypasses become obsolete in minutes<\/h2> <p>Here\u2019s the critical discovery: SearchGuard uses a cryptographic system that can invalidate any bypass within minutes.<\/p> <p>The script generates encrypted tokens using an ARX cipher (Addition-Rotation-XOR) \u2013 similar to Speck, a family of lightweight block ciphers released by the NSA in 2013 and optimized for software implementations on devices with limited processing power. <\/p> <p>But there\u2019s a twist.<\/p> <p><strong>The magic constant rotates.<\/strong> The cryptographic constant embedded in the cipher isn\u2019t fixed. It changes with every script rotation.<\/p> <p>Observed values from our analysis:<\/p> <ul class=\"wp-block-list\"> <li>Timestamp 16:04:21: Constant = 1426<\/li> <li>Timestamp 16:24:06: Constant = 3328<\/li> <\/ul> <p>The script itself is served from URLs with integrity hashes: <code>\/\/www.google.com\/js\/bg\/{HASH}.js<\/code>. When the hash changes, the cache invalidates, and every client downloads a fresh version with new cryptographic parameters.<\/p> <p>Even if you fully reverse-engineer the system, your implementation becomes invalid with the next update.<\/p> <p>It\u2019s cat and mouse by design.<\/p> <h2 id=\"the-statistical-algorithms\" class=\"wp-block-heading\">The statistical algorithms<\/h2> <p>Two algorithms power SearchGuard\u2019s behavioral analysis:<\/p> <ul class=\"wp-block-list\"> <li><strong>Welford\u2019s algorithm<\/strong> calculates variance in real time with constant memory usage \u2013 meaning it processes each event as it arrives and updates a running statistical summary, without storing every past interaction. Whether the system has seen 100 or 100 million events, memory consumption stays the same.<\/li> <li><strong>Reservoir sampling<\/strong> maintains a random sample of 50 events per metric to estimate median behavior. This provides a representative sample without storing every interaction.<\/li> <\/ul> <p>Combined, these algorithms build a statistical profile of your behavior and compare it against what humans actually do.<\/p> <h2 id=\"serpapis-response\" class=\"wp-block-heading\">SerpAPI\u2019s response<\/h2> <p>SerpAPI\u2019s founder and CEO, Julien Khaleghy, shared this statement with Search Engine Land:<\/p> <blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"> <p>\u201cSerpApi has not been served with Google\u2019s complaint, and prior to filing, Google did not contact us to raise any concerns or explore a constructive resolution. For more than eight years, SerpApi has provided developers, researchers, and businesses with access to public search data. The information we provide is the same information any person can see in their browser without signing in. We believe this lawsuit is an effort to stifle competition from the innovators who rely on our services to build next-generation AI, security, browsers, productivity, and many other applications.\u201d<\/p> <\/blockquote> <p>The defense may face challenges. The DMCA doesn\u2019t require content to be non-public \u2013 it prohibits circumventing technical protection measures, period. If Google proves SerpAPI deliberately bypassed SearchGuard protections, the \u201cpublic data\u201d argument may not hold.<\/p> <h2 id=\"what-this-means-for-seo-and-the-bigger-picture\" class=\"wp-block-heading\">What this means for SEO \u2013 and the bigger picture<\/h2> <p>If you\u2019re building SEO tools that programmatically access Google Search, 2025 was brutal.<\/p> <p>In January, Google deployed SearchGuard. Nearly every SERP scraper suddenly stopped returning results. SerpAPI had to scramble to develop workarounds \u2013 which Google now calls illegal circumvention.<\/p> <p>Then in September, Google removed the <code>num=100<\/code> parameter \u2013 a long-standing URL trick that allowed tools to retrieve 100 results in a single request instead of 10. Officially, Google said it was \u201cnot a formally supported feature.\u201d But the timing was telling: forcing scrapers to make 10x more requests dramatically increased their operational costs. Some analysts suggested the move specifically targeted AI platforms like ChatGPT and Perplexity that relied on mass scraping for real-time data.<\/p> <p>The combined effect: traditional scraping approaches are increasingly difficult and expensive to maintain.<\/p> <p><strong>For the industry:<\/strong> This lawsuit could reshape how courts view anti-scraping measures. If SearchGuard qualifies as a valid \u201ctechnological protection measure\u201d under DMCA, every platform could deploy similar systems with legal teeth.<\/p> <p>Under DMCA Section 1201, statutory damages range from $200 to $2,500 per circumvention act. With hundreds of millions of alleged violations daily, the theoretical liability is astronomical \u2013 though Google\u2019s complaint acknowledges that \u201cSerpApi will be unable to pay.\u201d<\/p> <p>The message isn\u2019t about money. It\u2019s about setting precedent.<\/p> <p>Meanwhile, the antitrust case rolls on. Judge Mehta ordered Google to share its index and user data with \u201cQualified Competitors\u201d at marginal cost. One hand is being forced open while the other throws punches.<\/p> <p>Google\u2019s position: \u201cYou want our data? Go through the antitrust process and the technical committee. Not through scraping.\u201d<\/p> <p>Here\u2019s the uncomfortable truth: Google technically offers publishers controls, but they\u2019re limited. Google-Extended allows publishers to opt out of AI training for Gemini models and Vertex AI \u2013 but it doesn\u2019t apply to Search AI features including AI Overviews.<\/p> <p>Google\u2019s documentation states: <\/p> <blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"> <p>\u201cAI is built into Search and integral to how Search functions, which is why robots.txt directives for Googlebot is the control for site owners to manage access to how their sites are crawled for Search.\u201d<\/p> <\/blockquote> <p>Court testimony from DeepMind VP Eli Collins during the antitrust trial confirmed this separation: content opted out via Google-Extended could still be used by the Search organization for AI Overviews, because Google-Extended isn\u2019t the control mechanism for Search.<\/p> <p>The only way to fully opt out of AI Overviews? Block Googlebot entirely \u2013 and lose all search traffic.<\/p> <p>Publishers face an impossible choice: accept that your content feeds Google\u2019s AI search products, or disappear from search results altogether.<\/p> <p>Your move, courts.<\/p> <h3 class=\"wp-block-heading\" id=\"h-dig-deeper\">Dig deeper<\/h3> <p><em>This analysis is based on version 41 of the BotGuard script, extracted and deobfuscated from challenge data in January 2026. The information is provided for informational purposes only.<\/em><\/p> <\/div> <p> <em>Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.<\/em> <\/p> <p>News#SearchGuard #Google #detects #bots #SerpAPI #lawsuit #reveals1768856777<\/p> ","protected":false},"excerpt":{"rendered":"<p>We fully decrypted Google\u2019s SearchGuard anti-bot system, the technology at the center of its recent lawsuit against SerpAPI. After fully deobfuscating the JavaScript code, we now have an unprecedented look at how Google distinguishes human visitors from automated scrapers in real time. What happened. Google filed a lawsuit on Dec. 19 against Texas-based SerpAPI LLC, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1783,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[89,668,75,1967,83,4089,4087,4088],"class_list":["post-1782","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-careers","tag-bots","tag-detects","tag-google","tag-lawsuit","tag-news","tag-reveals","tag-searchguard","tag-serpapi"],"acf":[],"_links":{"self":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/1782","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1782"}],"version-history":[{"count":0,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/1782\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/media\/1783"}],"wp:attachment":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1782"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1782"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1782"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}