WordPress robots.txt

WordPress Robots.txt: Directives for Good Bots

Before we talk about the WordPress robots.txt file, let’s first understand what “robots” are.

Robots are search engine bots that move all around the internet. They look for new information on websites and pages. When they find a site, they crawl through it.

They check the content to see how good it is. After this, they index the content, which means it can show up in search results.

Here’s a visualization of the scenario:

search engine bots, crawling, indexing, ranking

But without proper guidance, even the most diligent bot can astray and stumble upon pages that you won’t want.

That’s why you need a robots.txt to navigate them to the appropriate pages of your WordPress.

Let’s dive into the key directives to get you started.

What is a robots.txt file?

The robots.txt is a simple text file to interact with search engine bots. It was first introduced by Martijn Koster in 1994 as the Robots Exclusion Protocol (REP) to guide legitimate search bots on which pages to crawl and which to avoid. 

So, whenever bots crawl your site, they check the robots.txt file first. Then explore the pages accordingly with these directives.

This ensures certain parts of your website remain private and less prioritized by search engines.

Over time, major search engines like Google, Bing, etc. adopted this standard to respect the owner’s priority and reduce server load.

Meanwhile, other malicious bots, such as spammers and DDoS attackers, have never complied.

To deal with these malicious bad bots, see our guide to prevent bot traffic from your website.

Why do you need a robots.txt file in WordPress?

Talking about websites, search engines have a crawl budget for each one. If you exceed it, your website won’t be crawled until the next session. 

Though, it’s not a huge concern for small WordPress sites. But for larger ones, you have to be cautious about this budget and use robots.txt file to guide them.

You can start this process with just a simple text.

Learn How to Fix the 404 Not Found Errors with a Step-by-Step Guide. Let’s Read!

Locate the robots.txt file in WordPress

WordPress has made it even more simple. You will get a pre-created robots.txt file with every WordPress website.

Just search the robots.txt file in your root directory.

Wordpress robots.txt

The file will contain a basic WordPress directive like this:

As it’s an initial file, you need to understand the directives of robots.txt to make your own command.

Directives of robots.txt file for WordPress

When a site owner wishes to guide the web bots they use different directives in the robots.txt files.

So, the bots can follow the directives and explore the pages accordingly.

1. User-agent: This section specifies which bots the following rule will apply.

2. Disallow: Indicates which pages or directories the web bots should avoid crawling.

3. Allow: Specifies which pages or directories they are allowed to crawl.

4. Sitemap: Help bots find all the pages they need to crawl.

5. Crawl-delay: Specifies the delay time (in seconds) between requests made by crawlers. However, it’s not a standardized directive, so many search engine bots (including Google) ignore it.

6. Comments: You can also add comments to your robots.txt with a hashtag for clarity.

You can apply these directives according to your particular needs. Just be precise with the implementation. 

Accidentally blocking a helpful page or search bot could affect the potential traffic and user experience.

Edit robots.txt file in WordPress

You can use both plugins or manual options to edit a robots.txt file in WordPress. However, using plugins can be easier for beginners.

Use a plugin for editing robots.txt

Many WordPress plugins will allow you to edit the robots.txt file without touching the root directory.

Wp Robots Txt is one of them. It’s a free plugin to edit robots.txt files.

You just need to follow this procedure.

  1. Install and activate the WP Robots Txt plugin.
  2. Then go to Reading from the settings option.
  3. You will find the robots.txt content section with some pre-added directives.
  4. Add your User Agents and Directory Path and select either Allow or Disallow.
  5. Click Save Changes after you’re done.
WP robots txt

It’s easy to use with minimal effort. But you can always choose the manual route if you’re not a fan of plugins and their potential drawbacks.

Edit robots.txt Manually

If you have access to your site’s root directory, you can find the pre-existing robots.txt file in the public_html folder.

Instead, use a File Transfer Protocol (FTP) client to access and edit this file directly.

  1. Go to the public_html and search for robots.txt file.
  2. Right-click on the robots.txt file and select the edit option.
  3. Add your desired rule in the file and click save. 
robots.txt wordpress

To check the status you can type the /robots.txt after the main URL of your website.

This way you can address which bots can crawl which pages and index them quickly in search engines.

But this isn’t done yet! You can also use the robots exclusion rules more precisely through Robots Meta Tags and X-Robots-Tag HTTP headers.

Robots meta tags for noindex

The Robot meta tag is not a robots.txt file. Rather it’s a meta code to indicate search bots not to index or follow certain pages.

So, it won’t work in non-HTML content like images, text files, or PDFs.

You have to place it inside the header section of an HTML file.

However, you might be wondering how to implement this meta code in a pre-coded template like WordPress. 

Let’s guide you to the methods!

1. Using an SEO plugin for meta robots tag

You’re probably familiar with SEO plugins such as Yoast SEO and All in One SEO in WordPress. 

These plugins allow to use robots meta tags without harming the page codes.

Here’s how you can do it with Yoast SEO:

  • Select the page you want to manage. Then scroll to the Yoast section and find the Advanced Options
  • You may find an option that asks permission to show your content in search results.
Noindex option yoast

Selecting the “no” option will place a noindex meta code automatically in the selected page.

To confirm everything’s set up correctly, just check the meta tag of robots in the page source.

It’s a straightforward way to take control of how search engines interact with your site.

But if you’ve enough coding knowledge then you can use the Meta Robots Tag for more advanced implementation.

2. Adding meta robots tag manually in your theme File

To add the robots meta tag manually, you can use the WordPress theme editor.

  • Go to Appearance > Theme > Editor in your WordPress dashboard.
  • Open the header.php file.
  • Inside the “<head>” section, add a meta tag like this:

This tells search engines not to index the page or follow its links.

Important Note: Adding this tag directly into the header can involve every page on your website. To avoid this, use conditional code to specify which pages should or shouldn’t be indexed.

For example, If you are blocking your “About Us” page then the condition will be:

With this code, the restriction will apply only to the “About Us” page. While the other pages remain untouched.

But there’s another robots tag that works similarly but offers more options as it also includes the non-html files.

X-Robots-Tag HTTP header

You can be more technical and flexible with X-Robots-Tag. This tag allows you to control the indexation of different file types and pages at the server level. 

It’s a powerful way to apply the Robots Exclusion Protocol for your WordPress, especially if you want to exclude media files or sections.

Follow the steps to set X-Robots-Tag headers in WordPress.

1. Block non-html files with X-Robots-Tag

The X-Robots-Tag complies at the root level. So, you need to edit the .htaccess file from the root directory.

  1. In your website’s file manager or via FTP, find and open the .htaccess file.
  2. Add the following code to instruct search engines not to index certain file types, like PDFs:

This will tell search engines not to index the pdf file named “guide”. Also, you can use this similar process for PNG, JPG, and other files.

2. Setting X-Robots-Tag for Specific Pages

To block specific pages (like a private policy page) from being indexed, you can add similar instructions:

This setup tells search engines not to index the /privacy-policy page while leaving other pages unaffected.

After saving the .htaccess file, test your site to make sure it’s working as expected. You can use Google’s URL Inspection Tool in the Search Console to confirm that your X-Robots-Tag headers are correctly applied.

Important Note: Be cautious when you are editing a .htaccess file. It’s a root file of your site. So, a single error can break down the entire website. Always make a backup of this file before editing, so you can restore it if needed.

Best robots.txt files used by popular websites

Just in case, you are searching for ideas about how to guide search engine bots more effectively.

No problem. here’s a list of popular websites robots.txt to give you an overview.

During our research on ideal robots.txt files, we discovered some interesting examples of how different websites direct robots.txt. 

1. Google 

Google’s robots.txt URL: https://www.google.com/robots.txt

2. Bing

Bing’s robots.txt URL: https://www.bing.com/robots.txt

3. YouTube

YouTube’s robots.txt URL: https://www.youtube.com/robots.txt

4. Twitter

Twitter’s robots.txt URL: https://twitter.com/robots.txt

5. Wikipedia

Twitter’s robots.txt URL: https://en.wikipedia.org/robots.txt

Wrapping up

These are all the directives and techniques to guide you through implementing the robots.txt file in WordPress.

Keep in mind that robots.txt is created just to guide search bots to the pages you want (or don’t want) to be crawled. 

However, it’s not guaranteed that every bot will follow every rule. It’s up to them (search bots) to choose which directives they might want to follow.

WordPress Robots.txt – FAQ

Here are the answers to some common questions about the robots.txt file in WordPress

Your robots.txt file is located in the root directory of your hosting server.

If you want to ensure that WordPress does not serve its virtual robots.txt file, you can add the following code to your theme’s functions.php file or a site-specific plugin:
add_filter('robots_txt', 'disable_default_robots_txt');
function disable_default_robots_txt($output) {return ''; // Return an empty string to disable the default robots.txt}

You can have only one robots.txt file in a website, accessible at yourwebsite.com/robots.txt.

The maximum size for a robots.txt file is 500 KB (kilobytes). This limit is set by search engines like Google, and if a robots.txt file exceeds this size, search engines may ignore the file entirely or only read the first 500 KB of it.

If a website does not have a robots.txt file, search engines will assume that all areas of the site are open for crawling and indexing.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Get support insights directly in inbox!
Blog subscribe form
Fluent Support
Best AI-Powered Helpdesk in WordPress