ToolNimba Browse

🤖 Robots.txt Generator

By ToolNimba Editorial Team · Reviewed by ToolNimba Editorial Team, Technical SEO reviewers · Updated 2026-06-19

robots.txt is a public file and controls crawling, not access or indexing. Never use it to hide sensitive or private URLs, as listing them only reveals where they are. Always verify the generated file in Google Search Console or your testing tool of choice before relying on it.

Use * to match every crawler, or a name like Googlebot.

Seconds between requests. Google ignores this; Bing and Yandex honour it.

Path rules

Paths start with / and are relative to the site root. Leave a rule blank to skip it.

Fill in the fields above to build your robots.txt file.

This robots.txt generator builds a valid robots.txt file for your site without you having to remember the exact syntax. Choose which crawler the rules apply to, add as many Allow and Disallow path rules as you need, set an optional crawl-delay, and point search engines to your sitemap. The file updates as you type, and you can copy it with one click. Everything runs in your browser, so nothing you enter is sent anywhere.

What is the Robots.txt Generator?

A robots.txt file is a plain text file that lives at the root of your domain, at example.com/robots.txt. It tells search engine crawlers which parts of your site they may and may not request, using a simple set of directives. The file is part of the Robots Exclusion Protocol, a long-standing convention that the major search engines follow voluntarily.

The file is organised into groups. Each group starts with one or more User-agent lines that name a crawler (or * for all crawlers), followed by Allow and Disallow rules that apply to that group. A Disallow line lists a path prefix the crawler should not fetch, while an Allow line carves out an exception inside a disallowed area. Paths are matched from the start of the URL path, so Disallow: /admin blocks /admin, /admin/, and /admin/login alike. An empty Disallow value means nothing is blocked.

It is important to understand what robots.txt does and does not do. It controls crawling, not indexing. A well-behaved crawler will not fetch a disallowed page, but if other sites link to that URL it can still appear in search results without a description. To keep a page out of the index entirely, use a noindex meta tag or an X-Robots-Tag header on a page that is allowed to be crawled, or protect it behind authentication. robots.txt is also public, so never use it to hide sensitive URLs, as listing them there simply advertises where they are.

When to use it

  • Blocking crawlers from admin areas, internal search results, or staging paths that should not appear in search engines.
  • Pointing search engines at your XML sitemap so new and updated pages are discovered faster.
  • Allowing a specific subfolder while disallowing its parent, for example permitting /private/public-doc.pdf inside an otherwise blocked /private/ directory.

How to use the Robots.txt Generator

  1. Set the User-agent. Leave it as * to apply the rules to every crawler, or enter a name such as Googlebot for a single crawler.
  2. Add path rules. For each one, pick Allow or Disallow and type the path, which should start with a / and is relative to your domain root.
  3. Optionally set a crawl-delay in seconds and paste the full URL of your sitemap.
  4. Read the generated file in the output box, click Copy, and save it as robots.txt at the root of your site.

Formula & method

Each group is: User-agent: <name> followed by one or more Disallow: <path> and Allow: <path> lines. A Sitemap: <url> line is global and may appear once for the whole file.

Worked examples

Allow all crawlers everywhere except the admin area, and declare a sitemap.

  1. User-agent is left as * to cover every crawler.
  2. One rule is added: Disallow: /admin/.
  3. The sitemap URL https://example.com/sitemap.xml is entered.

Result: User-agent: * Disallow: /admin/ Sitemap: https://example.com/sitemap.xml

Block a whole folder but allow one file inside it.

  1. Add Disallow: /private/ to block the folder.
  2. Add Allow: /private/brochure.pdf to permit that single file.
  3. Order does not matter to modern crawlers; they apply the most specific match.

Result: User-agent: * Disallow: /private/ Allow: /private/brochure.pdf

Let every crawler access the entire site with no restrictions.

  1. Keep User-agent as *.
  2. Add no path rules at all.
  3. The generator writes an empty Disallow, which means nothing is blocked.

Result: User-agent: * Disallow:

robots.txt directives and what they do

DirectivePurposeExample
User-agentNames the crawler a group of rules applies toUser-agent: Googlebot
DisallowBlocks a path prefix from being crawledDisallow: /admin/
AllowPermits a path inside a disallowed areaAllow: /admin/public/
Crawl-delaySuggests seconds to wait between requestsCrawl-delay: 10
SitemapPoints crawlers to your XML sitemap (full URL)Sitemap: https://site.com/sitemap.xml

Common rule patterns

GoalRule
Allow everythingDisallow:
Block the whole siteDisallow: /
Block one folderDisallow: /folder/
Block a file type (where supported)Disallow: /*.pdf$
Block URLs with a query stringDisallow: /*?

Common mistakes to avoid

  • Using robots.txt to hide a page from search results. Disallow only stops crawling, not indexing. A blocked URL can still be listed if other pages link to it. To remove a page from results, allow crawling and add a noindex tag, or require login.
  • Accidentally blocking the whole site. Disallow: / blocks every URL. This often slips into a live file when it was copied from a staging configuration. Always check that a leading slash is followed by the path you actually meant to block.
  • Placing the file in the wrong location. robots.txt must sit at the domain root, at example.com/robots.txt. Crawlers do not look for it in subfolders, so a file at example.com/blog/robots.txt is ignored.
  • Expecting crawl-delay to work everywhere. Google does not support Crawl-delay and ignores it, though it honours a crawl rate set in Search Console. Bing and Yandex do read it. Do not rely on it as your only way to manage crawl load.

Glossary

robots.txt
A plain text file at the root of a domain that tells crawlers which paths they may and may not request.
User-agent
The name a crawler identifies itself by, such as Googlebot or Bingbot. An asterisk (*) matches all crawlers.
Disallow
A directive that asks a crawler not to fetch URLs starting with the given path. An empty value blocks nothing.
Allow
A directive that permits a path inside an otherwise disallowed area, letting you make exceptions.
Crawl-delay
An optional directive suggesting how many seconds a crawler should wait between requests. Honoured by Bing and Yandex, ignored by Google.
Robots Exclusion Protocol
The voluntary standard that defines how robots.txt is written and interpreted by cooperating crawlers.

Frequently asked questions

What is a robots.txt file?

A robots.txt file is a plain text file placed at the root of your domain that tells search engine crawlers which parts of your site they may and may not crawl. It uses simple User-agent, Disallow, and Allow directives that cooperating crawlers follow.

Where do I put the robots.txt file?

It must live at the root of your domain, reachable at example.com/robots.txt. Crawlers only look in that one location, so a file placed in a subfolder will not be found or used.

Does robots.txt stop a page from appearing in Google?

Not reliably. robots.txt controls crawling, not indexing, so a disallowed URL can still appear in results if other sites link to it. To keep a page out of the index, allow crawling and add a noindex meta tag, or protect it with a login.

What does Disallow with no value mean?

An empty Disallow line means nothing is blocked, so the crawler may access the whole site. This generator writes Disallow: with no path when you add no rules, which is the standard way to allow everything.

How do I block all crawlers from my entire site?

Set User-agent to * and add a single rule of Disallow: / which blocks every URL. Use this only when you genuinely want the site kept out of search engines, such as a private staging environment.

Is the crawl-delay directive supported by Google?

No. Google ignores Crawl-delay and instead lets you set a crawl rate in Search Console. Bing and Yandex do honour Crawl-delay, so include it if you need to slow those crawlers down.

Sources