What is the TXT Robot and How to Set it Up?

The TXT robot is a text file that is owned by a website/blog, which functions to instruct search engine bots on how to interact with the website or blog. TXT robot files are usually used by many webmasters to instruct bots about which pages, directories, and URLs to crawl and which parts should not be crawled. In fact, it can block all search engine bots that come to a site. To make it easier to understand, please see the illustration image below.

This Robot TXT file can be likened to a homeowner and a search engine bot is like a guest. So, a homeowner has the right to instruct all guests who come about which room/room they can enter and which room/room they are not allowed to enter. So in essence, this robot.txt is a .txt file created by webmasters in a programming language that search engine bots understand about which page sections, directories, and URLs can be crawled and indexed on search engines.

Robot.txt Command list

The following is a basic programming language that is often contained in the robot.txt file

  • User-agent: *:  This command means the command code that applies to all bots, be it google bot, google mobile bot, google image bot, bing bot, and so on to submit to commands loaded in a robot.txt file
  • User-agent: Googlebot-mobile: is an order intended for Googlebot-mobile only
  • “Disallow:”: this command serves to explain which parts are not allowed by the bot to crawl
  • “Allow: /”: this command functions allow bots to crawl all web pages except those that are listed in the disallow command.

If you are still confused, I will complete the following tutorials to make it easier to understand

Tutorial 1: How to free all search engine bots to crawl all web contents indefinitely

User-agent: *

Tutorial 2: How to Block all search engine bots from crawling all web content

User-agent: *
Disallow: /

Tutorial 3: How to block all bots into several directories

User-agent: *
Disallow: / cgi-bin /
Disallow: / tmp /
Disallow: / wp-admin /

Tutorial 4: How to Block only one type of bot. example: we only want to block the Yandex bot

User-agent: YandexBot
Disallow: /

How to set Robot.txt to make it more SEO Friendly

By default, the robot.txt setting on blogger and wordpress will allow all search engine robots to crawl as many as all the pages, directories, and files on a website. You need to know, the more freedom we give search engine robots to crawl a website, the worse its impact will be on the SERP of search results. This is because, not all pages on a website/blog can be categorized as high-quality pages in the eyes of search engines and ultimately the more low-quality pages are indexed, the worse the quality of a website or blog is in the eyes of search engines. Therefore, the robot.txt setting is one of the things that need to be done in optimizing Onpage SEOTo make it easier to understand, I divided this tutorial into two parts, namely the robot.txt setting in wordpress self-hosting, and blogger.

How to Setting Robot.txt on WordPress Selfhosting

The first thing you have to do is login cpanel -> file manager -> public _HTML -> find the robots.txt file -> right click edit (utf8). If you don’t find the robots.txt file, please create a new file in public_html and name the file robots.txt

After entering the robots.txt file, please enter the script below

sitemap: http://www.dosenhosting.com/sitemap.xml
User-agent: *
# disallow all files in these directories
Disallow: / wp-admin /
Disallow: / wp-includes /
Disallow: / cgi-bin /
Disallow: / wp-content /
Disallow: / archives /
Disallow: / *? *
Disallow: *? replytocom
Disallow: / author
Disallow: / comments / feed /
Disallow: * / trackback /
Disallow: / wp- *
Disallow: / *? *
User-agent: Mediapartners-Google *
Allow: /
User-agent: Googlebot-Image
Allow: / wp-content / uploads /
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /

Don’t forget to replace sitemap: www.dosenhosting.com with your domain

How to Setting Robot.txt for blogger

Please login to your blogger account, select settings -> search preferences -> activate a special robot.txt

User-agent: Mediapartners-Google
User-agent: Googlebot
Disallow: / search
Disallow: /? M = 1
Disallow: /? M = 0
Disallow: / *? M = 1
Disallow: / *? M = 0
User- agent: *
Disallow: / search
Sitemap: http://dosenhosting.blogspot.com/feeds/posts/default?orderby=UPDATED

Don’t forget to replace the sitemap with their respective blogspot domains.

You need to know, there are lots of robot.txt recommendations for bloggers on the internet, but personally, I prefer the robot.txt setting in the script above because in my personal opinion the above settings can prevent duplicate content.

Also Read: How to Set Up the All In One SEO Pack Plugin

Leave a Comment