Yandex not crawling compressed sitemap index

I have submitted a sitemap index file (one that links to other sitemaps that contain the actual URLs search engines are instructed to crawl). It is GZip compressed.

Using the Yandex sitemap validation tool it tells me it is valid and has 202 links and no errors.

However, in Yandex Webmaster it shows up with a small, grey sign in the status column. When clicked it says ‘Not indexed’.

Yandex is not indexing the URLs provided in the file, which are all new. Though it states it has consulted the sitemap.

Any ideas what may be wrong?

Google Search Console cannot read my XML: Sitemap appears to be an HTML page

I’m working on a web application written with AngularJS (v8) and deployed on an apache2 using proxy to forward requests (frontend, api, backoffice).

My problem is that I’m trying to submit the sitemap ({website}/sitemap.xml) on Google, but Google Search Console keep saying that it’s not valid: Google can read the link but it seem to be in HTML

gsc

My sitemap: sitemap

I tried to validate that XML on many website and I didn’t find any error.

I mentioned apache2 because maybe when Google try to fetch the URL, before finding the XML, apache give another page but I cannot prove that. I tried in many ways and the first page that I see when opening the URL is the sitemap and nothing else.

In my angular.json I added the file in the assets as follow:

"assets": ["src/favicon.ico", "src/assets", "src/sitemap.xml"],

What it can be?

Thank you

Sitemap: Should I dynamically update sitemap for dynamic content or create a page containing all the dynamic links

Say i have the following route http://<my-domain>/{category}/subjects/{id}/Sitemap: Should I dynamically update sitemap for dynamic content or create a page containing all the dynamic links

the ones in the brackets are dynamic, I’m struggling with what is better or any better way to let google crawl through all these dynamic links

Approach 1: manually doing the job by removing or adding the record to site map and updating <lastmod>

Approach 2: create a page that includes all those links and reference that page in sitemap.xml

The second approach can be generated as a plain html file which is generated from the server app. Or, a simple webform aspx page that dynamically generates those links without having to create an html file.

Sitemap for one website in multiple domains

I have 2 domains (A and B). The first one (A) is the website.

In sitemap.xml of “A” domain, I have a sub-sitemap in “B” domain with “A” URLs.

I have follow the documentation (https://support.google.com/webmasters/answer/75712?hl=en). So these 2 domains are verified in the Google Search Console but Google does not index the sub-sitemap (in “B” domain with “A” URLs). Other sub-sitemap in “A” domain are OK.

FYI: the sitemap is valid because if I send it manually in the Search Console (on the B domain property), URLs is displayed in Google search results. An for multiple reasons, I can’t sent it manually every time.

Do you have an idea ? Thanks

Changed URL for a page that was indexed by Googlebot. Will redirect 301 from the old URL to the new one. But what to do with my Sitemap?

I’m planning to change a url for one of my site’s page.

Example:

From: https://www.example.com/old-post-slug

To: https://www.example.com/new-post-slug

The fact is that Google has already indexed the old url: https://www.example.com/old-post-slug

And from these DOC’s, we see that to avoid lose page ranking we should respond with a 301 - Moved permanently from the old URL pointing to the new URL.

https://support.google.com/webmasters/answer/6033049?hl=en

enter image description here

QUESTION

I get that I should redirect (301) from the old URL to the new one. So when Google re-crawls, it will see that change. But what should be on my Sitemap? The old URL or the new one? Or both?

I tend to think that it would be best to keep only the new url on my Sitemap. But what if Google crawls the new URL before it sees the redirect from the old one? Wouldn’t the new page URL start off as a new page (from Google’s index perspective) with zero ranking points? How does Googlebot handles that? What is the recommended practice?

Google Search console cannot fetch my sitemap. How to force Google to index a site?

I am working on a project that just rebranded.

Using Google Search Console I am getting some weird errors. Despite my sitemap_index.xml and robot.txt working properly, google cannot seem to fetch it for some reason.

my sitemap is here: https://example.com/sitemap_index.xml

and my robot.txt: https://example.com/robot.txt

When I try to get my site referenced on google this is what I get: enter image description here enter image description here

If I click on Open sitemaps it opens just fine.

This is what google is saying on my url inspection: enter image description here

I tried reindexing multiple times but nothing changed.

The site has been live for over a month now and is still not referenced despite having backlinks pointing to it from linked in and more.

Where can this be coming from? I asked Google support but no luck and asked my DNS provider to double check everything but it seems fine. I’ve also hired a DevOps to check my server configuration but apparently everything is fine.

Add URL to sitemap to be available in the future

I generate automatically the sitemap.xml on publish of content on my website, however some of the content will be publish starting from an specific date and time.

Is there any tag that I can add to a sitemap or any other way to cater for this? so that the google/bing… bots would know to only index the content if the date is greater than ‘now’.

I know that I could use a task scheduler to update the sitemap file when the content publish date is reached but I was trying to avoid that solution.

Custom Post Type – Category Rewrite – Remove Rewrite from Sitemap

I’ve got “case_studies” post type and categories for it as “case_studies_categories” with a rewrite to include category in URL.

Everything works, but for some reason, rewrite url is in sitemap (as the first URL), for example:

/case-studies/%case_studies_categories%/ 

and the rest is fine:

/case-studies/%case_studies_categories%/ /case-studies/category-name/post-name/ /case-studies/category-name/post-name/ /case-studies/category-name/post-name/ 

How to remove it (/case-studies/%case_studies_categories%/) from Sitemap?

    add_action('init','case_studies_init');  function case_studies_init(){      $  labels = array(         'name'               => _x( 'Case Studies', 'Case Studies' ),         'singular_name'      => _x( 'Case Study', 'Case Study' ),         'add_new'            => _x( 'Add Case Study', 'Case Study' ),         'add_new_item'       => __( 'Add Case Study' ),         'edit_item'          => __( 'Edit Case Study' ),         'new_item'           => __( 'New Case Study' ),         'all_items'          => __( 'All Case Study' ),         'view_item'          => __( 'View Case Study' ),         'search_items'       => __( 'Search Case Study' ),         'not_found'          => __( 'No Case Studies Found' ),         'not_found_in_trash' => __( 'No Case Studies in Trash' ),         'parent_item_colon'  => '',         'menu_name'          => 'Case Studies'     );     $  args = array(         'labels'                => $  labels,         'description'           => 'Holds case studies post data',         'public'                => true,         'menu_position'         => 7,         'hierarchical'          => true,         'menu_icon'             => 'dashicons-admin-comments',         'rewrite'               => array('slug' => 'case-studies/%case_studies_categories%', 'with_front' => false),         'supports'              => array( 'title', 'revisions', 'thumbnail'),         'has_archive'           => true,         'show_ui'               => true,         'show_in_nav_menus'     => true,         'show_in_menu'          => true,         'show_in_admin_bar'     => true,         'taxonomies'            => array("case_study_categories"),     );      register_post_type('case_studies',$  args);   //  flush_rewrite_rules( false ); }  // register a custom category taxonomy type // so that the categories are not connected to the 'post' type taxonomies  add_action( 'init', 'register_case_study_tax' );  function register_case_study_tax(){      $  labels = array(     'name'              => _x( 'Case Study Categories', 'case-studies'),     'singular_name'     => _x( 'Case Study Category', 'testimonials'),     'search_items'      => __( 'Search Case Study Categories'),     'all_items'         => __( 'All Case Study Categories'),     'parent_item'       => __( 'Parent Case Study Category'),     'parent_item_colon' => __( 'Parent Case Study Category:'),     'edit_item'         => __( 'Edit Case Study Category'),     'update_item'       => __( 'Update Case Study Category'),     'add_new_item'      => __( 'Add Case Study Category'),     'new_item_name'     => __( 'New Case Study Category'),     'menu_name'         => __( 'Case Study Categories'),     );      $  args = array(     'labels'                => $  labels,     'taxonomy'              => 'case_study_categories',     'object_type'           => 'case_studies',     'hierarchical'          => true,     'show_ui'               => true,     'show_admin_column'     => true,     'query_var'             => false,     );      register_taxonomy('case_studies_categories', 'case_studies', $  args); }   /** filter URL link for post type url **/ add_filter('post_type_link', 'case_studies_permalink_structure', 10, 4);  function case_studies_permalink_structure($  post_link, $  post, $  leavename, $  sample) {   if ( false !== strpos( $  post_link, '%case_studies_categories%' ) ) {     $  event_type_term = get_the_terms( $  post->ID, 'case_studies_categories' );     if($  event_type_term)     $  post_link = str_replace( '%case_studies_categories%', array_pop( $  event_type_term )->slug, $  post_link );    }  return $  post_link; } 

I’m sure I’ve done something stupid, please assist if possible.

Thanks,

In a sitemap, should I update the lastmod tag of a url based on the text content or html content?

Imagine I have this blogging / ecommerce website with 1000 posts / products. And I’ve built a sitemap for it, which is dynamically generated. Basically it’s a list with a bunch of <url> and <lastmod> tags.

I’m pretty sure that the crawlers expect me to update the <lastmod> dates for whatever product or blogpost that I edit and change the text content (or change the images). Add something new, update information, etc. Basically, anything that users will SEE differently when they enter my page. This makes sense.

But my question is:

I have a dynamic single page website. So I don’t keep static pages. I generate and render them (server-side) at run time. So what if I decide that all of my blogposts now should render inside a <main> or <article> tag instead of a div? Or what if I add some structured meta data to add price and review properties for my products, or to add structured data for breadcrumbs.

You see what I mean? The content that the user sees hasn’t changed. But I’ve updated some tags that the CRAWLER will interpret differently. The text/image content is the same, but the HTML content has changed. And this could even have impact on my ranking, since I’m throwing in new tags that might get me better SEO.

But now what should I do? The changes I made now will render the 1000 posts / products in a different way with the new tags (in the perspective of the crawler). Should I update the <lastmod> tag to ALL of my 1000 urls in my sitemap? The user will still see the same text/image content and will not notice any difference.

If I do update all the 1000 <lastmod> tags, won’t the crawler think that it’s “weird” that now all of my urls have been updated on the same day? Since they’ll all have the same <lastmod> tags. Does it make sense?

Please, any help is appreciated. Thanks