What is robots.txt?
Robots.txt is a text file in the root of your web server that guides Google and other search engines in crawling your website. The file specifies whether Google may or may not crawl parts of the website by allowing (allow) or block (disallow) access to specific files and folders.
Your website should have a robots.txt file, as there will almost always be content that you don't want Google to burden your web server with crawling. For example:
- internal search pages
- administrator pages that should not be publicly accessible
- sections under development
- pages with sensitive information.
Don't block Google from JavaScript-CSS and image files used on your website, as these files are necessary for Google to see the correct version of the website. Also, don't block Google from the content you want Google to index.
Please note that a block in robots.txt does not guarantee that a page will not appear in search results. Google may still index the page if it can find it by other means (for example, via a link from another website). To avoid a page being indexed, add a noindex-tag on the page. Read more on support.google.com.