Editing Robots.txt (section)

==Examples==
This example tells all robots that they can visit all files because the wildcard <code>*</code> stands for all robots and the <code>Disallow</code> directive has no value, meaning no pages are disallowed. Search engine giant Google open-sourced their robots.txt parser,<ref>{{cite web |url=https://github.com/google/robotstxt |title=Google Robots.txt Parser and Matcher Library |access-date=April 13, 2025}}</ref> and recommends testing and validating rules on the robots.txt file using community-built testers such as Tame the Bots <ref>{{cite web |url=https://tamethebots.com/tools/robotstxt-checker |title=Robots.txt Testing & Validator Tool - Tame the Bots |access-date=April 13, 2025}}</ref> and Real Robots Txt.<ref>{{cite web |url=https://www.realrobotstxt.com/ |title=Robots.txt parser based on Google's open source parser from Will Critchlow, CEO of SearchPilot |access-date=April 13, 2025}}</ref> 

<pre>
User-agent: *
Disallow: 
</pre>

This example has the same effect, allowing all files rather than prohibiting none.

<pre>
User-agent: *
Allow: /
</pre>

The same result can be accomplished with an empty or missing robots.txt file.

This example tells all robots to stay out of a website:

<pre>
User-agent: *
Disallow: /
</pre>

This example tells all robots not to enter three directories:

<pre>
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
</pre>

This example tells all robots to stay away from one specific file:

<pre>
User-agent: *
Disallow: /directory/file.html
</pre>

All other files in the specified directory will be processed.


This example tells one specific robot to stay out of a website:

<pre>
User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot
Disallow: /
</pre>

This example tells two specific robots not to enter one specific directory:

<pre>
User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot
User-agent: Googlebot
Disallow: /private/
</pre>

Example demonstrating how comments can be used:

<pre>
# Comments appear after the "#" symbol at the start of a line, or after a directive
User-agent: * # match all bots
Disallow: / # keep them out
</pre>

It is also possible to list multiple robots with their own rules. The actual robot string is defined by the crawler. A few robot operators, such as [[Google]], support several user-agent strings that allow the operator to deny access to a subset of their services by using specific user-agent strings.<ref name="google-webmasters-spec" />

Example demonstrating multiple user-agents:

<pre>
User-agent: googlebot        # all Google services
Disallow: /private/          # disallow this directory

User-agent: googlebot-news   # only the news service
Disallow: /                  # disallow everything

User-agent: *                # any robot
Disallow: /something/        # disallow this directory
</pre>

=== The use of the wildcard * in rules ===
The directive <code>Disallow: /something/</code> blocks all files and subdirectories starting with <code>/something/</code>.

In contrast using a wildcard, (if supported by the crawler), allows for more complex patterns in specifying paths and files to allow or disallow from crawling, for example <code>Disallow: /something/*/other</code> blocks URLs such as:
<pre>/something/foo/other
/something/bar/other
</pre>

It would not prevent the crawling of <code>/something/foo/else</code>, as that would not match the pattern.

The wildcard <code>*</code> allows greater flexibility but may not be recognized by all crawlers, although it is part of the Robots Exclusion Protocol RFC <ref>{{Cite report |url=https://www.rfc-editor.org/rfc/rfc9309.html#name-special-characters |title=Robots Exclusion Protocol |last=Koster |first=Martijn |last2=Illyes |first2=Gary |last3=Zeller |first3=Henner |last4=Sassman |first4=Lizzi |date=September 2022 |publisher=Internet Engineering Task Force |issue=RFC 9309}}</ref>

A wildcard at the end of a rule in effect does nothing, as that is the standard behaviour.