Amazonbot Finally Respects robots.txt: Complete Guide for Web Developers (2026)

TL;DR

After years of community frustration and a viral Hacker News post (115+ points), Amazon has quietly updated Amazonbot — Amazon's web crawler — to finally respect robots.txt directives. This is a significant milestone for web developers, sysadmins, and SEO professionals who have struggled to control how Amazon's bots access their sites.

This guide covers everything you need to know: what Amazonbot is, the controversy, what changed, how to configure robots.txt for Amazonbot, and the broader implications for the web ecosystem.

What Is Amazonbot?

Amazonbot is Amazon's proprietary web crawler, identifiable by the user-agent string Amazonbot/1.0 and the IP ranges published in Amazon's aws-ip-ranges.json file. It serves multiple purposes across Amazon's ecosystem:

  • Alexa Web Ranking — Historical web traffic data collection for Alexa's ranking and analytics services (now deprecated but still partially active)
  • Product Indexing — Crawling product pages, reviews, and merchant content for Amazon's shopping experience and competitive intelligence
  • AI Training Data — Collecting web content for training Amazon's large language models, including the models powering Alexa and AWS AI services
  • AWS Services — Supporting various AWS offerings that require web crawling capabilities

Unlike search engine crawlers (Googlebot, Bingbot) that primarily index content for search results, Amazonbot's purposes are broader and less transparent — which is exactly why the community was concerned about its behavior.

The Controversy: Amazonbot Ignoring robots.txt

For years, web developers reported that Amazonbot was ignoring robots.txt rules. Server logs showed Amazonbot continuing to crawl paths that were explicitly disallowed. The issue was widely discussed in webmaster forums, Hacker News threads, and security communities.

The core complaints included:

  • Disallowed paths being crawled — Amazonbot would access /admin, /private, and other explicitly blocked sections
  • No rate limiting respectCrawl-delay directives were ignored, causing server load issues
  • Unexpected bandwidth consumption — Sites with limited bandwidth found Amazonbot consuming significant resources
  • Lack of transparency — Amazon provided no clear documentation on how Amazonbot handles robots.txt
  • Data collection concerns — With Amazonbot used for AI training, developers wanted control over whether their content was used to train Amazon's models

The issue came to a head when Xe Iaso published a detailed blog post documenting Amazonbot's behavior, which hit the Hacker News front page with 115+ points. The post demonstrated concrete evidence of Amazonbot disregarding robots.txt directives and sparked widespread community pressure for Amazon to fix it.

What Changed?

Sometime in mid-May 2026, Amazon quietly updated Amazonbot to honor robots.txt directives. The change was first noticed by web developers who had been monitoring their server logs — Amazonbot suddenly stopped crawling disallowed paths.

Key changes include:

  • robots.txt compliance — Amazonbot now properly reads and respects Disallow directives
  • Crawl-delay support — Rate limiting via Crawl-delay is now honored
  • User-agent targeting — Rules targeting Amazonbot specifically are now enforced
  • Improved IP documentation — Amazon's crawler IP ranges are more clearly documented

While Amazon has not issued a formal announcement about the change, the server log evidence is clear and consistent across multiple reports. This is a win for the open web and community-driven accountability.

How to Configure robots.txt for Amazonbot

Now that Amazonbot respects robots.txt, you can control its access with standard directives. Here's how:

Block Amazonbot Completely

To disallow Amazonbot from crawling any part of your site:

User-agent: Amazonbot
Disallow: /

Block Amazonbot from Specific Paths

To allow general crawling but block sensitive areas:

User-agent: Amazonbot
Disallow: /admin/
Disallow: /private/
Disallow: /api/
Disallow: /wp-admin/

Set Crawl Rate

To limit crawl frequency (reduces server load):

User-agent: Amazonbot
Crawl-delay: 10
Disallow: /admin/

Allow Amazonbot from Specific Paths Only

To restrict Amazonbot to certain content:

User-agent: Amazonbot
Allow: /blog/
Allow: /public/
Disallow: /

Complete Example with Other Bots

Here's a complete robots.txt that manages Amazonbot alongside other crawlers:

# Allow Googlebot full access
User-agent: Googlebot
Allow: /

# Block Amazonbot from sensitive areas
User-agent: Amazonbot
Crawl-delay: 30
Disallow: /admin/
Disallow: /private/
Disallow: /checkout/
Disallow: /api/

# Block ChatGPT-User (OpenAI crawler)
User-agent: GPTBot
Disallow: /

# Default rules for all other bots
User-agent: *
Allow: /
Crawl-delay: 5

Sitemap: https://www.example.com/sitemap.xml

Verifying Amazonbot Behavior

After updating your robots.txt, you can verify Amazonbot's compliance by:

  1. Check server logs — Monitor access logs for Amazonbot/1.0 user-agent hitting disallowed paths
  2. IP range verification — Cross-reference crawler IPs against Amazon's aws-ip-ranges.json
  3. robots.txt Tester — Use Google Search Console's robots.txt tester (works for testing syntax, though Amazonbot may have its own parser)
  4. Monitor bandwidth — Check if Amazonbot's bandwidth usage decreases after blocking

Implications for Web Developers

This change has several important implications:

For SEO Professionals

Amazonbot is not a search engine crawler in the traditional sense. Blocking it does not affect your Google or Bing search rankings. However, if you use Amazon-related services (Amazon Associates, product feeds, etc.), Amazonbot access may be relevant to those services. Evaluate your specific use case before blanket-blocking.

For Content Creators

With Amazonbot used for AI training data collection, creators now have more control over whether their content is used to train Amazon's AI models. If you want to opt out of AI training data collection, add Disallow: / for Amazonbot.

For Sysadmins

Server administrators can now reliably use robots.txt to manage Amazonbot's impact on server resources. Combined with Crawl-delay, this gives you fine-grained control over crawler behavior without resorting to IP-based blocking.

For Web Scraping / Bot Management

This change signals that community pressure can effectively influence large tech companies' bot behavior. It sets a precedent for asking other crawler operators (AI training bots, research crawlers) to respect robots.txt as well.

The Bigger Picture: robots.txt and AI Crawlers

Amazonbot's compliance update comes at a time when the web is grappling with a surge of AI crawlers. Companies like OpenAI (GPTBot), Anthropic, Google (Google-Extended), and others have introduced dedicated crawler user-agents for AI training data collection. The robots.txt protocol — originally designed in 1994 for search engine crawlers — is being retrofitted for an entirely new class of automated access.

The community's success with Amazonbot shows that the robots.txt protocol, despite its age and limitations, remains a powerful tool for site owners to express their preferences. When companies choose to ignore it, the community can — and does — push back through public discourse and technical pressure.

For web developers, the takeaway is clear: maintain an up-to-date robots.txt file that explicitly addresses AI and non-search crawlers. The era of assuming all bots respect the rules is giving way to a more nuanced landscape where each bot operator makes its own compliance decisions.

Summary

Amazonbot now respects robots.txt directives after sustained community pressure highlighted by a viral Hacker News post. This is a meaningful win for web developers who want to control how Amazon's crawler accesses their sites. Update your robots.txt with explicit Amazonbot rules, verify compliance through server logs, and consider the broader implications for AI training data collection on your content.

The protocol works — but only when companies choose to follow it. Hold them accountable.

← Back to Blog