Increase of Googlebot response time after CDN implementation

Problem: We have a site hosted in Germany that is suffixed with “.de”. Recently, we implemented a CDN with Cloudflare. Shortly after our Googlebot response times increased from 60-100ms to 400-500ms. We noticed that the origin from Googlebot requests come from the US. However, this is the usual case as we learned.

There are solution to overcome this, e.g. Caching. However, our goal is to understand the underlying change that led to the increase. We have reviewed a lot of our config in the last days and haven’t yet understood the problem. Cloudflare uses anycast ips, thus a website can not necessarily be geolocated.

Our current hypothesis: The Googlebot takes into account that a site is hosted on a different continent and subtracts a certain amount of response time.

Question aim: Receive hints what to look for to understand the underlying cause.

Why does Googlebot attempt to crawl /admin/install.php?

On one site I own, I recently started seeing Googlebot checking for non-existing URIs:

66.249.76.89 - - [23/Feb/2020:10:18:48 +0100] "GET /robots.txt HTTP/1.1" 404 118 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 66.249.76.87 - - [23/Feb/2020:10:18:49 +0100] "GET /admin/install.php HTTP/1.1" 404 181 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 

This would all be well and good, if not for the fact that it has never done so before, this URI never existed (I own the domain for 10+ years) and looks suspiciously like casually scanning for possible security issues.

89.76.249.66.in-addr.arpa domain name pointer crawl-66-249-76-89.googlebot.com. is also indeed a Googlebot address.

Can anyone shed more light on this?

Changed URL for a page that was indexed by Googlebot. Will redirect 301 from the old URL to the new one. But what to do with my Sitemap?

I’m planning to change a url for one of my site’s page.

Example:

From: https://www.example.com/old-post-slug

To: https://www.example.com/new-post-slug

The fact is that Google has already indexed the old url: https://www.example.com/old-post-slug

And from these DOC’s, we see that to avoid lose page ranking we should respond with a 301 - Moved permanently from the old URL pointing to the new URL.

https://support.google.com/webmasters/answer/6033049?hl=en

enter image description here

QUESTION

I get that I should redirect (301) from the old URL to the new one. So when Google re-crawls, it will see that change. But what should be on my Sitemap? The old URL or the new one? Or both?

I tend to think that it would be best to keep only the new url on my Sitemap. But what if Google crawls the new URL before it sees the redirect from the old one? Wouldn’t the new page URL start off as a new page (from Google’s index perspective) with zero ranking points? How does Googlebot handles that? What is the recommended practice?

Stop googlebot crawling URL more than once?

I have a site that usually creates a few thousand pages a day, which don’t change after they have been created. Recently my dedicated server has crashed due to googlebot crawling the site too often. According to the search console, many days googlebot crawls the site tens of thousands of times a day, indicating they keep crawling pages they already crawled. I am aware I can limit the googlebot crawl rate, but is it possible to force googlebot to crawl a page ONCE and ONCE only?

enter image description here

How to Use Robots.txt to block GoogleBot, but not AdsBot

Hi,

I am managing an eCommerce brand that has thousands of products. Some of these products have multiple SKUs (variants in terms of colour). These multiple SKU's use a URL query string parameter to differentiate between the colour variants. Since they are the same product, but only vary by colour, they are all canonicalised to the non-colour version for SEO.

Example setup of products:

Hugo Boss T-Shirt product page (/product/hugo-boss-red-t-shirt) with the below…

How to Use Robots.txt to block GoogleBot, but not AdsBot