Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Robots.txt
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Examples== This example tells all robots that they can visit all files because the wildcard <code>*</code> stands for all robots and the <code>Disallow</code> directive has no value, meaning no pages are disallowed. Search engine giant Google open-sourced their robots.txt parser,<ref>{{cite web |url=https://github.com/google/robotstxt |title=Google Robots.txt Parser and Matcher Library |access-date=April 13, 2025}}</ref> and recommends testing and validating rules on the robots.txt file using community-built testers such as Tame the Bots <ref>{{cite web |url=https://tamethebots.com/tools/robotstxt-checker |title=Robots.txt Testing & Validator Tool - Tame the Bots |access-date=April 13, 2025}}</ref> and Real Robots Txt.<ref>{{cite web |url=https://www.realrobotstxt.com/ |title=Robots.txt parser based on Google's open source parser from Will Critchlow, CEO of SearchPilot |access-date=April 13, 2025}}</ref> <pre> User-agent: * Disallow: </pre> This example has the same effect, allowing all files rather than prohibiting none. <pre> User-agent: * Allow: / </pre> The same result can be accomplished with an empty or missing robots.txt file. This example tells all robots to stay out of a website: <pre> User-agent: * Disallow: / </pre> This example tells all robots not to enter three directories: <pre> User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/ </pre> This example tells all robots to stay away from one specific file: <pre> User-agent: * Disallow: /directory/file.html </pre> All other files in the specified directory will be processed. This example tells one specific robot to stay out of a website: <pre> User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot Disallow: / </pre> This example tells two specific robots not to enter one specific directory: <pre> User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot User-agent: Googlebot Disallow: /private/ </pre> Example demonstrating how comments can be used: <pre> # Comments appear after the "#" symbol at the start of a line, or after a directive User-agent: * # match all bots Disallow: / # keep them out </pre> It is also possible to list multiple robots with their own rules. The actual robot string is defined by the crawler. A few robot operators, such as [[Google]], support several user-agent strings that allow the operator to deny access to a subset of their services by using specific user-agent strings.<ref name="google-webmasters-spec" /> Example demonstrating multiple user-agents: <pre> User-agent: googlebot # all Google services Disallow: /private/ # disallow this directory User-agent: googlebot-news # only the news service Disallow: / # disallow everything User-agent: * # any robot Disallow: /something/ # disallow this directory </pre> === The use of the wildcard * in rules === The directive <code>Disallow: /something/</code> blocks all files and subdirectories starting with <code>/something/</code>. In contrast using a wildcard, (if supported by the crawler), allows for more complex patterns in specifying paths and files to allow or disallow from crawling, for example <code>Disallow: /something/*/other</code> blocks URLs such as: <pre>/something/foo/other /something/bar/other </pre> It would not prevent the crawling of <code>/something/foo/else</code>, as that would not match the pattern. The wildcard <code>*</code> allows greater flexibility but may not be recognized by all crawlers, although it is part of the Robots Exclusion Protocol RFC <ref>{{Cite report |url=https://www.rfc-editor.org/rfc/rfc9309.html#name-special-characters |title=Robots Exclusion Protocol |last=Koster |first=Martijn |last2=Illyes |first2=Gary |last3=Zeller |first3=Henner |last4=Sassman |first4=Lizzi |date=September 2022 |publisher=Internet Engineering Task Force |issue=RFC 9309}}</ref> A wildcard at the end of a rule in effect does nothing, as that is the standard behaviour.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Robots.txt
(section)
Add topic