my client has to inform it’s customers about some new regulations that the Googlebot should NOT crawl. It is not possible to place this information on a separate page and disallow Google to crawl it. So the idea is to place a button/link on the page, that will AJAX-load the corresponding information only when the user clicks it. My assumption is, that Google is unable to click the link and crawl that specific AJAX content.
Am I right? And if yes, is there an official documentation that proofs my point on this?
My access log is full of requests for non-existent pdfs relating to ‘viagra’ and ‘cialis’ or other similar drugs from GoogleBot (user-agent is: Googlebot/2.1 (+http://www.google.com/bot.html) ip range is 66.249.64.*)
Problem: We have a site hosted in Germany that is suffixed with “.de”. Recently, we implemented a CDN with Cloudflare. Shortly after our Googlebot response times increased from 60-100ms to 400-500ms. We noticed that the origin from Googlebot requests come from the US. However, this is the usual case as we learned.
There are solution to overcome this, e.g. Caching. However, our goal is to understand the underlying change that led to the increase. We have reviewed a lot of our config in the last days and haven’t yet understood the problem. Cloudflare uses anycast ips, thus a website can not necessarily be geolocated.
Our current hypothesis: The Googlebot takes into account that a site is hosted on a different continent and subtracts a certain amount of response time.
Question aim: Receive hints what to look for to understand the underlying cause.
This would all be well and good, if not for the fact that it has never done so before, this URI never existed (I own the domain for 10+ years) and looks suspiciously like casually scanning for possible security issues.
220.127.116.11.in-addr.arpa domain name pointer crawl-66-249-76-89.googlebot.com. is also indeed a Googlebot address.
I get that I should redirect (301) from the old URL to the new one. So when Google re-crawls, it will see that change. But what should be on my Sitemap? The old URL or the new one? Or both?
I tend to think that it would be best to keep only the new url on my Sitemap. But what if Google crawls the new URL before it sees the redirect from the old one? Wouldn’t the new page URL start off as a new page (from Google’s index perspective) with zero ranking points? How does Googlebot handles that? What is the recommended practice?
I have a site that usually creates a few thousand pages a day, which don’t change after they have been created. Recently my dedicated server has crashed due to googlebot crawling the site too often. According to the search console, many days googlebot crawls the site tens of thousands of times a day, indicating they keep crawling pages they already crawled. I am aware I can limit the googlebot crawl rate, but is it possible to force googlebot to crawl a page ONCE and ONCE only?
I am managing an eCommerce brand that has thousands of products. Some of these products have multiple SKUs (variants in terms of colour). These multiple SKU's use a URL query string parameter to differentiate between the colour variants. Since they are the same product, but only vary by colour, they are all canonicalised to the non-colour version for SEO.
Example setup of products:
Hugo Boss T-Shirt product page (/product/hugo-boss-red-t-shirt) with the below…
How to Use Robots.txt to block GoogleBot, but not AdsBot