A bot, also known as a web robot, web spider or web crawler, is a software application designed to automatically perform simple and repetitive tasks in a more effective, structured, and concise manner than any human could ever do.
The most common use of bots is in web spidering or web crawling.
SEMrushBot is the search bot software that SEMrush sends out to discover and collect new and updated web data.
Data collected by SEMrushBot is used in:
- the AdSense (Display Advertising) reports
- the public backlink search engine index maintained as a dedicated tool called “SEMrush backlinks” (web graph of links)
- the Site Audit tool that analyzes on-page SEO, technical and usability issues
SEMrushBot’s crawl process starts with a list of web page URLs. When SEMrushBot visits these URLs it crawls the internal website structure detecting all the hyperlinks within the site and adding them to the list of URLs to follow. This list, also known as the "crawl frontier", is recursively visited according to a set of SEMrush policies to effectively map a site for updates: content changes, new pages, and dead links. Also, SEMrushBot searches for advertising information, such as Google AdSense.
Bots are crawling your web pages to help parse your site content, so that the relevant information within your site is easily indexed and more readily available to users searching for the content you provide.
Although most bots are harmless and quite beneficial, you still may want to prevent bots from crawling your site (please note, however, that not everyone on the web is using a bot to help index your site). The easiest and quickest way to do this is to use “robots.txt”. This text file contains instructions on how a bot should process your site data.
To stop SEMrushBot from crawling your site, add the following rules to your "robots.txt" file:
To block SEMrushBot from crawling your site for web graph of links, add:
Please note that there might be a delay up to two weeks before SEMrushBot discovers the changes you made to robots.txt.
To remove SEMrushBot from crawling your site for different SEO and technical issues, add:
If you want to prevent the "file not found" error messages in your web browser server log, create an empty "robots.txt" file.
Make sure that the "robots.txt" file is in the top directory of the server; otherwise, there will be no effect on the SEMrushBot behavior.
Please do not try to block SEMrushBot via IP in .htaccess as we do not use any consecutive IP blocks.
If SEMrushBot is still crawling your site, make sure SEMrushBot can retrieve your "robots.txt.
For more information about bots, please refer to http://www.robotstxt.org/.
If you still have any questions about SEMrushBot, please contact us at [email protected] and we will respond as soon as possible.
If you think that SEMrushBot does not obey your "robots.txt" rules, please provide us with your website URL, the log entries showing SEMrushBot crawling the pages that it was not supposed to, and we will work quickly to resolve the issue.