Can a webpage differ in content if ‘http’ is changed to ‘https’ or if ‘www.’ is added after ‘http://’ (or ‘https://’)?


When I use the Python package newspaper3k package and run the code

import newspaper paper = newspaper.build('http://abcnews.com', memoize_articles=False) for url in paper.article_urls():     print(url) 

I get a list of URLs for articles that I can download, in which both these URLs exist

  • http://abcnews.go.com/Health/coronavirus-transferred-animals-humans-scientists-answer/story?id=73055380
  • https://abcnews.go.com/Health/coronavirus-transferred-animals-humans-scientists-answer/story?id=73055380

As can be seen, the only difference between the two URLs is the s in https.

The question is, can the webpage content differ simply because an s is added to http? If I scrape a news source (in this case http://abcnews.com), do I need to download both articles to be sure I don’t miss any article, or are they guaranteed to have the same content so that I can download only one of them?

I have also noticed that some URLs also are duplicated by adding www. after the http:// (or https://). I have the same question here: Can this small change cause the webpage content to differ, and is this something I should take into account or can I simply ignore one of these two URLs?