Robots.txt file: The Ultimate Guide for SEO 2021

2023-06-16 06:35:00

203 read

What is a robots.txt file?

Search engines, like Google, use the robots.txt file to inform their crawlers which pages on your site to skip over when they come to do a site crawl. It also instructs web crawlers on which pages they should avoid. Imagine that a search engine is about to visit a website. Before visiting the destination website, it will first look for instructions in the robots.txt file. To block search engines from viewing and tracking the specified pages, the webmaster lists URLs they do not want to be indexed by Google or any other search engine. To figure out what it can and cannot investigate during a crawl, a bot first looks at a website's robots.txt file when searching the Internet.

How to use robots.txt?

Robots.txt is helpful because it aids in the optimization of the crawl budget by ensuring that the spider only visits the most important parts of a page and does not waste time crawling irrelevant details. There are some websites where it is better to keep your content private. In addition, by referring to specific sites in this file, you may compel Google to index them. Robots.txt files limit crawler access to certain parts of your site, such as pictures and pdfs, and may protect whole website sections if you establish different robots.txt files for each root domain. You can see this in action on the payment details page. Additionally, you can prevent results from your company's internal search from showing up in the SERPs.

How to find robots.txt?

The contents of robots.txt files are available to the public. If there is a file named robots.txt on the root domain, you may find it by adding /robots.txt to the end of the URL. If nothing appears, you still need to get a robots.txt file. Creating a robots.txt file is easy; I will show you how in a few short paragraphs. Please keep private information out of this file. If you have a robots.txt file, you may modify it in your hosting's root directory by going to the files admin or your website's FTP. You can find robots.txt on the backend, as well.

Where is robots.txt in WordPress?

You can access your Robots.txt files through CMS. In WordPress, depending on the CMS you are using, you can access Robots txt by logging onto the wp-admin section, then going to the CMS plugin's configuration, general settings, tools, etc.; there you can find directly edit the robots.txt section or file editor. You may add a robots.txt file in the admin panel, for instance, if you're using the Yoast SEO plugin for WordPress. Enter the backend of your WordPress website, navigate to the SEO Tools section, and select File Editor.As before, set your user agents and rules in the same order as previously. WordPress' wp-admin and wp-includes folders have been banned for web crawlers, although humans and bots may still access other sites. To use the robots.txt file, save your modifications and click Save changes to robots.txt.

How to create robots.txt?

You will need access to your domain's root directory. If you need help determining whether you have the proper key, ask your web hosting company for assistance. Then, make sure your robots.txt file is encoded in UTF-8. Characters outside the UTF-8 range may be ignored by Google and other well-known search engines and crawlers, making your robots.txt rules useless. Setting the user-agent is the next step in creating a robots.txt file. Allowing or blocking web crawlers and search engines is determined by the user-agent. The user-agent might be any number of things. The following list of the most common web crawlers: GoogleBot, BingBot, YandexBot, DuckDuckBot, etc. Within your robots.txt file, you may provide a user-agent in one of three methods. You are creating a single user agent, many user agents, and assigning the user agent to all crawlers.

What should be in robots.txt?

Setting rules for your file is the next step. Groups of robots.txt files are consulted. A group will identify the user agent and contain a regulation or directive indicating which files or directories the user agent may or cannot access. Three main commands should be used in robots.txt. First, the "disallow" directive refers to a page or directory in your root domain that you do not want the given user agent to crawl. It will begin with a forward slash (/) and conclude with the complete page address. If it leads to a directory, not a whole page, you will need a forward slash to terminate the path. Each rule might have a different set of prohibited settings. Second, the "allow" directive specifies a page or directory in your root domain that you want the given user agent to be able to crawl through. For instance, you'd use the allow directive to override a disallow rule. The URL will be preceded by a forward slash (/) and the full page address. If it leads to a directory, not a whole page, you will need a forward slash to terminate the path. Per the rule, you can utilize one or more allow settings. Third, the "sitemap" directive can point to the website's sitemap. The sole requirement is to be a URL with all its parts qualified. You can use as little as 0 or as much as necessary. Search engine spider bots can overload a server if the crawl delay command is enabled. Administrators can set in milliseconds how long the bot waits between each request. Other search engines, such as Bing and Yahoo, do recognize this query. Administrators in Google Search Console can alter the frequency with which their website is crawled.

How to add a sitemap to robots.txt?

An XML sitemap is a list of all pages on a website that you want robots to find and access in an XML file. All your blog articles should be accessible to search engines so that they appear in the results of a search. You may not want them to have access to your tag pages, though, because they may not make excellent landing sites and should not be featured in the search results. Additional metadata may be included in XML sitemaps to provide specific information about each URL. An XML sitemap, like robots.txt, is a must. Not only should you make sure search engine bots can find all of your sites, but you should also help the bots comprehend the significance of your content.

Add your XML sitemap's path to your robots.txt file by following these three simple steps. The first step is to locate your Sitemap URL, followed by your Robots.txt file, and finally, by adding your Sitemap address to the robots.txt file. Access to your web server is all that's required. If you need help finding and how to modify your website's robots.txt file, contact your web developer or hosting provider for assistance. You must include a directive with the URL in your robots.txt to help your sitemap file be found automatically. The sitemap directive can be inserted anywhere in the robots.txt file, which is essential to know. Because it's separate from the user-agent line, it doesn't matter where you put it. Visit your favorite website and append /robots.txt to the end of the domain to see what this looks like.

Where to put robots.txt?

A robots.txt file is not included by default because it's not necessary. Upload the file to the root directory of your website if you've decided to create one. The process of uploading is affected by the file structure of your website and the hosting environment in which it is hosted. If you're having trouble uploading your robots.txt file, contact your hosting provider for assistance.

How to edit robots.txt?

The most convenient option is using the SEO plugin to make changes to the robots.txt file. You may override the default robots.txt file and take control of your website. To change robots.txt, go to CMS's Robots.txt Editor tab. Robots are prohibited from crawling your admin pages by default. Another recommendation is to steer clear of plugins and themes altogether. They don't have any relevant content and aren't worth crawling. You may then use the rule builder to create rules. For example, if you want to add a law that prevents any robot from accessing a temporary directory, you can do so. Put the user agent in the user agent section to add a custom rule. You may also use the * sign to make your rule applies to all robots. Select Allow or Disallow to allow or prevent the user agent from running. Choosing which bots to let or block is the first step in configuring the Directory Path. You're now ready to click the button to save your changes.

Further rules can be added by adding a new direction and following the procedures above. When you're finished, be sure to save your work. Anyway, your new rules will display in the preview section as soon as you keep them.