ryuken
Ya'rub ibn Ahmad
- Joined
- Jun 10, 2024
- Posts
- 11,048
- Reputation
- 23,167
About Amazonbot
Amazonbot is Amazon's web crawler used to improve our services, such as enabling Alexa to answer even more questions for customers. Amazonbot respects standard robots.txt rules.
How Can I Identify Amazonbot?
In the user-agent string, you'll see “Amazonbot” together with additional agent information. An example looks like this:
⎘Copy code
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML\, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
How Can I Control What Amazonbot Crawls on my Site?
Robots.txt: Amazonbot respects the robots.txt directives user-agent and disallow. In the example below, Amazonbot won't crawl documents that are under /do-not-crawl/ or /not-allowed. Today, AmazonBot does not support the crawl-delay directive in robots.txt and robots meta tags on HTML pages such as “nofollow” and "noindex". If a ‘robots.txt’ file is changed, AmazonBot responds to any changes within 24 hours.
⎘Copy code
User-agent: Amazonbot # Amazon's user agent
Disallow: /do-not-crawl/ # disallow this directory
User-agent: * # any robot
Disallow: /not-allowed/ # disallow this directory
Link-Level Rel Parameter: Amazonbot supports the link-level rel=nofollow directive. Include these in your HTML like this to keep Amazonbot for following and not crawling a particular link from your website.
⎘Copy code
<a href="signin.php" rel=nofollow>Sign in </a>
...
Verifying Amazonbot
Verify that a crawler accessing your server is the official Amazonbot crawler by using DNS lookups. This helps you identify other bots or malicious agents that may be accessing your site while claiming to be Amazonbot.
You can use command line tools to verify Amazonbot by following these steps:
Locate the accessing IP address from your server logs
Use the host command to run a reverse DNS lookup on the IP address
Verify the retrieved domain name is a subdomain of crawl.amazonbot.amazon
Use the host command to run a forward DNS lookup on the retrieved domain name
Verify the returned IP address is identical to the original IP address from your server logs
⎘Copy code
For example:
$ host 12.34.56.789
789.56.34.12.in-addr.arpa domain name pointer 12-34-56-789.crawl.amazonbot.amazon.
$ host 12-34-56-789.crawl.amazonbot.amazon
12-34-56-789.crawl.amazonbot.amazon has address 12.34.56.789
Amazonbot is Amazon's web crawler used to improve our services, such as enabling Alexa to answer even more questions for customers. Amazonbot respects standard robots.txt rules.
How Can I Identify Amazonbot?
In the user-agent string, you'll see “Amazonbot” together with additional agent information. An example looks like this:
⎘Copy code
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML\, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
How Can I Control What Amazonbot Crawls on my Site?
Robots.txt: Amazonbot respects the robots.txt directives user-agent and disallow. In the example below, Amazonbot won't crawl documents that are under /do-not-crawl/ or /not-allowed. Today, AmazonBot does not support the crawl-delay directive in robots.txt and robots meta tags on HTML pages such as “nofollow” and "noindex". If a ‘robots.txt’ file is changed, AmazonBot responds to any changes within 24 hours.
⎘Copy code
User-agent: Amazonbot # Amazon's user agent
Disallow: /do-not-crawl/ # disallow this directory
User-agent: * # any robot
Disallow: /not-allowed/ # disallow this directory
Link-Level Rel Parameter: Amazonbot supports the link-level rel=nofollow directive. Include these in your HTML like this to keep Amazonbot for following and not crawling a particular link from your website.
⎘Copy code
<a href="signin.php" rel=nofollow>Sign in </a>
...
Verifying Amazonbot
Verify that a crawler accessing your server is the official Amazonbot crawler by using DNS lookups. This helps you identify other bots or malicious agents that may be accessing your site while claiming to be Amazonbot.
You can use command line tools to verify Amazonbot by following these steps:
Locate the accessing IP address from your server logs
Use the host command to run a reverse DNS lookup on the IP address
Verify the retrieved domain name is a subdomain of crawl.amazonbot.amazon
Use the host command to run a forward DNS lookup on the retrieved domain name
Verify the returned IP address is identical to the original IP address from your server logs
⎘Copy code
For example:
$ host 12.34.56.789
789.56.34.12.in-addr.arpa domain name pointer 12-34-56-789.crawl.amazonbot.amazon.
$ host 12-34-56-789.crawl.amazonbot.amazon
12-34-56-789.crawl.amazonbot.amazon has address 12.34.56.789