WordPress robots.txt: The Complete Guide (2026)
A complete guide to the WordPress robots.txt file: what it does, how to edit it with Yoast or Rank Math, plus best-practice examples and AI crawler rules.
The WordPress robots.txt file is one of the smallest files on your site, yet it quietly decides which crawlers can read your content and which get turned away at the door. Get it right and search engines and AI answer engines index the pages you care about. Get it wrong and you can accidentally hide your best content from Google, ChatGPT, and Perplexity at the same time.
This guide explains exactly what a robots.txt file does, what WordPress generates by default, how to edit your WordPress robots.txt safely, and which best practices actually matter in 2026, including how to handle the new wave of AI crawlers. Everything here is practical and copy-paste ready.
What a robots.txt File Actually Does
A robots.txt file is a plain-text file that lives at the root of your domain, for example https://example.com/robots.txt. It follows the Robots Exclusion Protocol, a long-standing standard that crawlers check before they fetch pages from your site.
The file is made up of a few simple directives:
- User-agent — names the crawler the rule applies to (for example
Googlebot, or*for all crawlers). - Disallow — tells that crawler not to fetch a given path.
- Allow — creates an exception to a Disallow rule.
- Sitemap — points crawlers to your XML sitemap.
Two things are important to understand. First, robots.txt is a set of instructions, not a security barrier. Well-behaved crawlers like Googlebot obey it, but it does not physically block access, so never use it to hide private data. Second, Disallow stops crawling, not indexing. If a disallowed page is linked from elsewhere, Google can still list the URL without a description. To keep a page out of search results entirely, use a noindex meta tag instead and leave the page crawlable.
The Default WordPress robots.txt
Here is something that surprises a lot of people: by default, WordPress does not create a physical robots.txt file in your site's folder. Instead, it generates a virtual one on the fly whenever a crawler requests it.
If you visit yourdomain.com/robots.txt on a stock install, you will typically see something close to this:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://yourdomain.com/wp-sitemap.xml
This default is sensible. It blocks the admin area while keeping admin-ajax.php open, because some themes and plugins rely on it to load content properly. The sitemap line only appears if your WordPress version or SEO plugin adds it.
Because this virtual file does not physically exist on your server, you cannot edit it by opening a file. You either replace it with a real robots.txt file in your root directory or use an SEO plugin, which is exactly what the next section covers.
How to Edit robots.txt in WordPress
There are three reliable ways to edit robots.txt in WordPress. Pick the one that matches the tools you already use.
Option 1: Yoast SEO
Yoast includes a built-in editor that creates a real robots.txt file for you.
- Go to Yoast SEO → Tools → File editor.
- If no physical file exists yet, click Create robots.txt file.
- Edit the contents in the text box and click Save changes to robots.txt.
Option 2: Rank Math
Rank Math offers a similar editor under its general settings.
- Go to Rank Math → General Settings → Edit robots.txt.
- Add or change your rules in the editor box.
- Click Save Changes.
Note that if a physical robots.txt file already exists in your root folder, Rank Math will show a notice and stop overriding it, because a real file always takes priority over a plugin-generated one.
Option 3: Edit the File Manually
If you prefer full control or do not use an SEO plugin, create the file yourself.
- Open a plain-text editor and create a file named exactly
robots.txt. - Add your directives and save it.
- Upload it to the root directory of your site (the same folder as
wp-config.php) using FTP, SFTP, or your host's file manager.
The moment a real robots.txt file exists in the root, WordPress stops serving the virtual version and your uploaded file takes over.
WordPress robots.txt Best Practices and Examples
A good WordPress robots.txt is short and deliberate. Resist the urge to block large sections of your site. Below is a clean, modern starting point that works for most WordPress blogs and business sites.
# robots.txt for https://yourdomain.com
User-agent: *
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /?s=
Disallow: /search/
Disallow: /cart/
Disallow: /checkout/
# Let search and AI crawlers reach assets and sitemaps
Allow: /wp-content/uploads/
Sitemap: https://yourdomain.com/sitemap_index.xml
A few notes on the choices above and the wider set of WordPress robots.txt best practices:
- Keep your sitemap line. It is the single most useful directive for getting new content discovered quickly. Use the URL your SEO plugin generates (Yoast uses
sitemap_index.xml, Rank Math usessitemap_index.xml, and core WordPress useswp-sitemap.xml). - Block low-value URLs, not whole directories. Internal search results (
/?s=), cart, and checkout pages add nothing in search and waste crawl budget. - Do not block
/wp-content/or/wp-includes/. Older guides told you to. Today, Google needs your CSS and JavaScript to render and understand pages, and blocking them can hurt how your site is evaluated. - Allow your uploads folder. This keeps images and media crawlable for image search and for AI engines that reference visual content.
- Use only one robots.txt at the root. It does not work in subfolders, and each subdomain needs its own.
Common WordPress robots.txt Mistakes
Most robots.txt damage is self-inflicted. These are the errors that show up again and again on WordPress sites.
- Disallowing the whole site. A stray
Disallow: /blocks everything. This is also the default in WordPress when "Discourage search engines" is checked under Settings → Reading, so always confirm that box is unchecked on a live site. - Trying to noindex with robots.txt. Disallowing a URL does not remove it from Google. Worse, if Google cannot crawl the page, it cannot see a
noindextag on it. To deindex, allow the crawl and addnoindex. - Blocking CSS and JavaScript. This breaks rendering and can lower your visibility. Leave theme and plugin assets crawlable.
- Forgetting the sitemap line. It is easy to add and speeds up discovery of new posts.
- Leaving a staging site fully blocked, then launching it. Migrations frequently ship a
Disallow: /from the staging environment straight to production. Check robots.txt the moment you go live. - Editing both a plugin file and a physical file. If a real file exists, plugin edits are ignored. Pick one source of truth.
AI Crawlers and Your robots.txt
The biggest change to robots.txt in recent years is the arrival of AI crawlers. Answer engines like ChatGPT, Perplexity, Google AI Overviews, and Claude all read robots.txt before fetching your pages, and most major AI companies now publish dedicated user-agents you can control individually.
It helps to know that most AI providers run two distinct types of bot:
- Training crawlers gather content used to train or update models (for example OpenAI's
GPTBot, Google'sGoogle-Extended, and Anthropic'sClaudeBot). - Search and retrieval crawlers fetch pages at query time so the engine can cite live sources (for example OpenAI's
OAI-SearchBotandChatGPT-User, Anthropic'sClaude-SearchBot, andPerplexityBot).
This distinction matters for AI search visibility. Blocking a search or retrieval bot can stop your brand from being cited in AI answers, while blocking a training bot only affects whether the underlying model learns from your content. If your goal is to appear in AI answers, you generally want to allow the search and retrieval crawlers. This is the foundation of answer engine optimization, and it pairs naturally with optimizing your content for AI search engines.
Here are the user-agents worth knowing in 2026, all of which can be referenced in robots.txt:
- OpenAI:
GPTBot(training),OAI-SearchBot(ChatGPT search index),ChatGPT-User(live user fetch). - Anthropic:
ClaudeBot(training),Claude-SearchBot,Claude-User. - Perplexity:
PerplexityBot,Perplexity-User. - Google:
Google-Extended, a control token for Gemini and Vertex AI grounding (it does not crawl on its own; standardGooglebotrules still cover AI Overviews). - Apple:
Applebot-Extended, which governs use of content for Apple's AI models.
If you want to welcome AI answer engines while keeping things tidy, you do not need to add anything special, since allowing all crawlers already covers them. If you previously blocked AI bots and want to reverse that, simply remove those Disallow lines. Either way, robots.txt only controls crawling; to actively guide how AI engines summarize your site, pair it with a structured content file, which we cover in our guide to llms.txt.
One honest caveat: robots.txt depends on crawlers choosing to obey it. The major, named AI bots above generally respect it, but it is a request, not a wall. For genuinely sensitive content, use authentication or server-level controls instead.
Want to know whether AI answer engines can actually see and cite your WordPress site? Run a free AEObot scan to check your AI search visibility in seconds.
Frequently Asked Questions
Where is the robots.txt file in WordPress?
By default WordPress serves a virtual robots.txt at yourdomain.com/robots.txt with no physical file on the server. Once you create a real robots.txt file in your site's root directory, or generate one through Yoast SEO or Rank Math, that file is served instead.
How do I edit robots.txt in WordPress without a plugin?
Create a plain-text file named robots.txt, add your directives, and upload it to the root folder of your site (the same directory as wp-config.php) using FTP, SFTP, or your host's file manager. WordPress will immediately serve your file in place of the virtual one.
Should I block AI crawlers in my WordPress robots.txt?
Usually not, if you want visibility in AI answers. Blocking search and retrieval bots like OAI-SearchBot, PerplexityBot, or ChatGPT-User can prevent your site from being cited by ChatGPT and Perplexity. Block training crawlers such as GPTBot or Google-Extended only if you specifically do not want your content used to train AI models.
Does robots.txt remove a page from Google?
No. A Disallow rule stops crawling but does not guarantee removal from search results, since Google can still index a URL it finds linked elsewhere. To keep a page out of search, leave it crawlable and add a noindex meta tag so Google can read and honor it.
What is the best WordPress robots.txt setup for SEO?
Keep it minimal: allow admin-ajax.php, disallow /wp-admin/ and low-value URLs like internal search and checkout pages, never block CSS or JavaScript, and always include your sitemap line. This keeps important content crawlable while pointing search engines and AI crawlers straight to your sitemap.
