🤖 Robots.txt Generator
By ToolNimba Editorial Team · Reviewed by ToolNimba Editorial Team, Technical SEO reviewers · Updated 2026-06-19
robots.txt is a public file and controls crawling, not access or indexing. Never use it to hide sensitive or private URLs, as listing them only reveals where they are. Always verify the generated file in Google Search Console or your testing tool of choice before relying on it.
Use * to match every crawler, or a name like Googlebot.
Seconds between requests. Google ignores this; Bing and Yandex honour it.
Paths start with / and are relative to the site root. Leave a rule blank to skip it.
Fill in the fields above to build your robots.txt file.
This robots.txt generator builds a valid robots.txt file for your site without you having to remember the exact syntax. Choose which crawler the rules apply to, add as many Allow and Disallow path rules as you need, set an optional crawl-delay, and point search engines to your sitemap. The file updates as you type, and you can copy it with one click. Everything runs in your browser, so nothing you enter is sent anywhere.
What is the Robots.txt Generator?
A robots.txt file is a plain text file that lives at the root of your domain, at example.com/robots.txt. It tells search engine crawlers which parts of your site they may and may not request, using a simple set of directives. The file is part of the Robots Exclusion Protocol, a long-standing convention that the major search engines follow voluntarily.
The file is organised into groups. Each group starts with one or more User-agent lines that name a crawler (or * for all crawlers), followed by Allow and Disallow rules that apply to that group. A Disallow line lists a path prefix the crawler should not fetch, while an Allow line carves out an exception inside a disallowed area. Paths are matched from the start of the URL path, so Disallow: /admin blocks /admin, /admin/, and /admin/login alike. An empty Disallow value means nothing is blocked.
It is important to understand what robots.txt does and does not do. It controls crawling, not indexing. A well-behaved crawler will not fetch a disallowed page, but if other sites link to that URL it can still appear in search results without a description. To keep a page out of the index entirely, use a noindex meta tag or an X-Robots-Tag header on a page that is allowed to be crawled, or protect it behind authentication. robots.txt is also public, so never use it to hide sensitive URLs, as listing them there simply advertises where they are.
When to use it
- Blocking crawlers from admin areas, internal search results, or staging paths that should not appear in search engines.
- Pointing search engines at your XML sitemap so new and updated pages are discovered faster.
- Allowing a specific subfolder while disallowing its parent, for example permitting /private/public-doc.pdf inside an otherwise blocked /private/ directory.
How to use the Robots.txt Generator
- Set the User-agent. Leave it as * to apply the rules to every crawler, or enter a name such as Googlebot for a single crawler.
- Add path rules. For each one, pick Allow or Disallow and type the path, which should start with a / and is relative to your domain root.
- Optionally set a crawl-delay in seconds and paste the full URL of your sitemap.
- Read the generated file in the output box, click Copy, and save it as robots.txt at the root of your site.
Formula & method
Worked examples
Allow all crawlers everywhere except the admin area, and declare a sitemap.
- User-agent is left as * to cover every crawler.
- One rule is added: Disallow: /admin/.
- The sitemap URL https://example.com/sitemap.xml is entered.
Result: User-agent: * Disallow: /admin/ Sitemap: https://example.com/sitemap.xml
Block a whole folder but allow one file inside it.
- Add Disallow: /private/ to block the folder.
- Add Allow: /private/brochure.pdf to permit that single file.
- Order does not matter to modern crawlers; they apply the most specific match.
Result: User-agent: * Disallow: /private/ Allow: /private/brochure.pdf
Let every crawler access the entire site with no restrictions.
- Keep User-agent as *.
- Add no path rules at all.
- The generator writes an empty Disallow, which means nothing is blocked.
Result: User-agent: * Disallow:
robots.txt directives and what they do
| Directive | Purpose | Example |
|---|---|---|
| User-agent | Names the crawler a group of rules applies to | User-agent: Googlebot |
| Disallow | Blocks a path prefix from being crawled | Disallow: /admin/ |
| Allow | Permits a path inside a disallowed area | Allow: /admin/public/ |
| Crawl-delay | Suggests seconds to wait between requests | Crawl-delay: 10 |
| Sitemap | Points crawlers to your XML sitemap (full URL) | Sitemap: https://site.com/sitemap.xml |
Common rule patterns
| Goal | Rule |
|---|---|
| Allow everything | Disallow: |
| Block the whole site | Disallow: / |
| Block one folder | Disallow: /folder/ |
| Block a file type (where supported) | Disallow: /*.pdf$ |
| Block URLs with a query string | Disallow: /*? |
Common mistakes to avoid
- Using robots.txt to hide a page from search results. Disallow only stops crawling, not indexing. A blocked URL can still be listed if other pages link to it. To remove a page from results, allow crawling and add a noindex tag, or require login.
- Accidentally blocking the whole site. Disallow: / blocks every URL. This often slips into a live file when it was copied from a staging configuration. Always check that a leading slash is followed by the path you actually meant to block.
- Placing the file in the wrong location. robots.txt must sit at the domain root, at example.com/robots.txt. Crawlers do not look for it in subfolders, so a file at example.com/blog/robots.txt is ignored.
- Expecting crawl-delay to work everywhere. Google does not support Crawl-delay and ignores it, though it honours a crawl rate set in Search Console. Bing and Yandex do read it. Do not rely on it as your only way to manage crawl load.
Glossary
- robots.txt
- A plain text file at the root of a domain that tells crawlers which paths they may and may not request.
- User-agent
- The name a crawler identifies itself by, such as Googlebot or Bingbot. An asterisk (*) matches all crawlers.
- Disallow
- A directive that asks a crawler not to fetch URLs starting with the given path. An empty value blocks nothing.
- Allow
- A directive that permits a path inside an otherwise disallowed area, letting you make exceptions.
- Crawl-delay
- An optional directive suggesting how many seconds a crawler should wait between requests. Honoured by Bing and Yandex, ignored by Google.
- Robots Exclusion Protocol
- The voluntary standard that defines how robots.txt is written and interpreted by cooperating crawlers.
Frequently asked questions
What is a robots.txt file?
A robots.txt file is a plain text file placed at the root of your domain that tells search engine crawlers which parts of your site they may and may not crawl. It uses simple User-agent, Disallow, and Allow directives that cooperating crawlers follow.
Where do I put the robots.txt file?
It must live at the root of your domain, reachable at example.com/robots.txt. Crawlers only look in that one location, so a file placed in a subfolder will not be found or used.
Does robots.txt stop a page from appearing in Google?
Not reliably. robots.txt controls crawling, not indexing, so a disallowed URL can still appear in results if other sites link to it. To keep a page out of the index, allow crawling and add a noindex meta tag, or protect it with a login.
What does Disallow with no value mean?
An empty Disallow line means nothing is blocked, so the crawler may access the whole site. This generator writes Disallow: with no path when you add no rules, which is the standard way to allow everything.
How do I block all crawlers from my entire site?
Set User-agent to * and add a single rule of Disallow: / which blocks every URL. Use this only when you genuinely want the site kept out of search engines, such as a private staging environment.
Is the crawl-delay directive supported by Google?
No. Google ignores Crawl-delay and instead lets you set a crawl rate in Search Console. Bing and Yandex do honour Crawl-delay, so include it if you need to slow those crawlers down.
Sources
- Introduction to robots.txt , Google Search Central (2025)
- Robots Exclusion Protocol (RFC 9309) , Internet Engineering Task Force (IETF) (2022)