For some reason, Google is indexing the robots.txt file for some of our sites and showing it in search results. See screenshots below.
Our robots.txt file is not linked from anywhere on the site and contains just the following:
This only happens for some sites. Why is this happening and how do we stop it?
Screenshot 1: Google Search console…
Why is Google indexing our robots.txt file and showing it in search results?
In my adsense account in "revenue optimization" i have crawl errors then when i click "fix crawl errors" then this..
Blocked Urls Error
http://www.rechargeoverload.in/search/label Robot Denied
Robots.txt is blocking my labels
I want to understand how the robots.txt file can be use by an attacker. I know we can have a list of paths and directories , that’s all or we can find more informations ?
I have a website (built with MediaWiki 1.33.0 CMS) which contains a robots.txt file.
In that file there is one line containing the literal domain of that site:
I usually prefer to replace literal domain referrals with a variable value call that will somehow (depends on the specific case) will be changed in execution to the value which is the domain itself.
An example to a VVC would be a Bash variable substitution.
Many CMSs have a global directives file which usually contains the base address of the website:
In MediaWiki 1.33.0 this file is
LocalSettings.php which contains the base address in line 32:
$ wgServer = "https://example.com";
How could I call this value with a variable value call in robots.txt?
This will help me avoid confusion and malfunction if the domain of the website is changed; I wouldn’t have to change the value manually there as well.
One can sort
robots.txt this way:
User-agent: DESIRED_INPUT Sitemap: https://example.com/sitemap-index.xml Disallow: /
User-agent: DESIRED_INPUT Disallow: / Sitemap: https://example.com/sitemap-index.xml
I assume both are okay because it’s likely the file is compiled in correct order by generally all crawlers.
Is it a best practice to put
Sitemap: to prevent an extremely unlikely bug of a crawler’s bad compilation of crawling before ignoring
Google Search Console and Mobile-Friendly Test both give me the following two warnings for my WordPress based website:
- Content wider than screen
- Clickable elements too close together
The screenshot that these sites provide of my website completely looks broken as if no CSS was applied.
My case was different. The following state is how my robots.txt file looks like, and I still get the same warning messages none the less. I am an SEO framework user, so I created my own static version of the robots.txt.
User-agent: * Allow: / Sitemap: https://*****
Along with the two warnings, I also almost always get “Page Loading Issue” on the test results. Could it be that this is a server speed related issue? I am located in Japan at the moment, and my website is also targeted mainly for Japanese, but I am using a SiteGround server and not a Japanese server. I am well aware that this is giving me a speed-related issue in general for my website, but is this also affecting the results of the above-mentioned google tests?
I am managing an eCommerce brand that has thousands of products. Some of these products have multiple SKUs (variants in terms of colour). These multiple SKU's use a URL query string parameter to differentiate between the colour variants. Since they are the same product, but only vary by colour, they are all canonicalised to the non-colour version for SEO.
Example setup of products:
Hugo Boss T-Shirt product page (/product/hugo-boss-red-t-shirt) with the below…
How to Use Robots.txt to block GoogleBot, but not AdsBot
Here's my robots.txt file. Is it at least benign, and not harming my site?
Those disallows, are MEANT to keep search engines from indexing those subdirectories in my domain, so they don't index spurious side stuff that I'm just storing in them.
Is it correct syntax?
If I would like to check whether a website has robots.txt, how would I check it?
I've noticed that a lot of sites I've recently evaluated do not have the XML and HTML sitemaps referenced with the robots.txt file.
Including sitemaps in the robots.txt is known to be good for SEO and helps GoogleBot with crawlability.
How important is it to you that your robots.txt file includes sitemap references?