Generador Robots.txt is an SEO tool that will allow you to create your robots.txt file for your website easily and without any knowledge of the subject. Complete the fields you are interested in indexing or not indexing and the tool will automatically create your robots.txt.
What is a robots.txt file?
A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, you should use noindex directives, or password-protect your page.
Understand the limitations of robots.txt
Before you create or edit robots.txt, you should know the limits of this URL blocking method. At times, you might want to consider other mechanisms to ensure your URLs are not findable on the web.
Robots.txt directives may not be supported by all search engines
The instructions in robots.txt files cannot enforce crawler behavior to your site, it's up to the crawler to obey them. While Googlebot and other respectable web crawlers obey the instructions in a robots.txt file, other crawlers might not. Therefore, if you want to keep information secure from web crawlers, it’s better to use other blocking methods, such as password-protecting private files on your server.
Different crawlers interpret syntax differently
Although respectable web crawlers follow the directives in a robots.txt file, each crawler might interpret the directives differently. You should know the proper syntax for addressing different web crawlers as some might not understand certain instructions.
A robotted page can still be indexed if linked to from from other sites
While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results. To properly prevent your URL from appearing in Google Search results, you should password-protect the files on your server or use the noindex meta tag or response header (or remove the page entirely).
Using the robots meta tag
The robots meta tag lets you utilize a granular, page-specific approach to controlling how an individual page should be indexed and served to users in search results. Place the robots meta tag in the <head> section of a given page, like this:
<!DOCTYPE html>
<html><head>
<meta name="robots" content="noindex" />
(…)
</head>
<body>(…)</body>
</html>
Valid indexing & directives
Other directives can be used to control indexing and serving with the robots meta tag and the X-Robots-Tag. Each value represents a specific directive. The following table shows all the directives that Google honors and their meaning. Note: it is possible that these directives may not be treated the same by all other search engine crawlers. Multiple directives may be combined in a comma-separated list (see below for the handling of combined directives). These directives are not case-sensitive.
Directives | |
---|---|
all |
There are no restrictions for indexing or serving. Note: this directive is the default value and has no effect if explicitly listed. |
noindex |
Do not show this page in search results and do not show a "Cached" link in search results. |
nofollow |
Do not follow the links on this page. |
none |
Equivalent to noindex, . |
noarchive |
Do not show a "Cached" link in search results. |
nosnippet |
Do not show a text snippet or video preview in the search results for this page. A static thumbnail (if available) will still be visible. |
notranslate |
Do not offer translation of this page in search results. |
noimageindex |
Do not index images on this page. |
unavailable_after: [RFC-850 date/time] |
Do not show this page in search results after the specified date/time. The date/time must be specified in the RFC 850 format. |
After the robots.txt file (or the absence of one) has given permission to crawl a page, by default pages are treated as crawlable, indexable, archivable, and their content is approved for use in snippets that show up in the search results, unless permission is specifically denied in a robots meta tag or X-Robots-Tag
.
Handling combined indexing and serving directives
You can create a multi-directive instruction by combining robots meta tag directives with commas. Here is an example of a robots meta tag that instructs web crawlers to not index the page and to not crawl any of the links on the page:
<meta name="robots" content="noindex, nofollow">
For situations where multiple crawlers are specified along with different directives, the search engine will use the sum of the negative directives. For example:
<meta name="robots" content="nofollow">
<meta name="googlebot" content="noindex">
The page containing these meta tags will be interpreted as having a noindex, nofollow directive when crawled by Googlebot.