How to Prevent Search Engines from Indexing Your Pages and Files
It is possible to block search engines from indexing your web pages or files using the robots.txt file and we will demonstrate how to do that in this tutorial.
Search engines crawl and index every web page they find following the links on other web pages. In general, it is a good thing for your website to be crawled and indexed by search engine spiders because search engines are still the number one traffic source for most websites on the web.
By default, search engine spiders will start crawling and indexing your website once it is published and linked to from other websites. In some cases, you may have some web pages or files on your website that you prefer not to be displayed on search results. For example, maybe you want to share your posts on your site only with your personal friends and connections and not with the general public. This is where robots.txt file comes to help.
robots.txt file basically controls which search engine spiders or other type of crawlers will be allowed to visit your website, crawl and index it. It consists of a set of instructions and good search engines will follow the instructions in this file. With the help of the robots.txt file, you can prevent your whole website or certain web pages and files from being indexed by search engines.
How to Create robots.txt File
1. Open your text editor, e.g. Notepad, and create a new text file.
2. Put your preferred robots instructions into this file. (see the examples below)
3. Save the file as robots.txt.
4. Upload robots.txt file to the root directory of your website.
To prevent all search engines from indexing your whole website, use the following code:
To prevent a specific robot/crawler from accessing your website:
Replace RobotName with the correct user agent of the robot you want to prevent.
To prevent all search engines from accessing certain directories on your website:
To prevent robots crawling a certain page or file on your website:
robots.txt file is a publicly accessible file. That means if a person uses the right URL to your robots.txt file, then s/he will see the content of it. Therefore you shouldn't use robots.txt file to hide or store critical information. If you are going to upload private files or publish private pages on your website, things that you don't want to share publicly, you should store them inside a password protected area on your website.
Please keep in mind that some web robots like scrapers and malware robots will ignore robots.txt file instructions. You should also know that if you don't have a properly configured robots.txt file, then all parts of your website will be indexed and visible by the search engines.
You can learn more about the robots.txt file on this site: robotstxt.org. You may also want to check what Google has to say about robots.txt files on this page.
More Server Tips
What is the Use of .ftpquota File? How to Install Apache HTTP Server on Windows How to Install Apache HTTP Server on Windows: The Apache Lounge Version Apache vs. Apache Lounge: Which One to Choose? How to Set FTP Quota for an FTP Account Apache Localhost Loading Very Slow: Here is the Solution How to Check If cURL is Enabled on Your ServerServer Tips