Getting numerous HEAD requests by Java user agents to resources that require authentication to view within a web application. Should I block them?

I have recently started using Cloudflare’s firewall in front of a web application. This app has a limited user base of selected applicants and they must log in to view anything. There is no public registration form and nothing within the portal can be accessed without an account.

Since moving the DNS to Cloudflare I can see we are receiving numerous daily HEAD requests to paths that are only accessible within the portal.

These requests come from one of two groups of IP addresses from the United States (we are not a US-based company; our own hosting is based in AWS Ireland region and we’re pretty sure at least 99% of our users have never been US-based):

Java User Agents

  • User agent is Java/1.8.0_171 or some other minor update version.
  • The ASN is listed as Digital Ocean.
  • The IP addresses all seem to have had similar behaviour reported previously, almost all against WordPress sites. Note that we’re not using WordPress here.

Empty User Agent

  • No user agent string.
  • The ASN is listed as Amazon Web Services.
  • The IP addresses have very little reported activity and do not seem at all connected to the Java requests.

Other Notes

  • The resources being requested are dynamic URLs containing what are essentially order numbers. We generate new orders every day, and they are visible to everyone using the portal.
  • I was unable to find any of the URLs indexed by Google. They don’t seem to be publicly available anywhere. There is only one publicly accessible page of the site, which is indexed.
  • We have potentially identified one user who seems to have viewed all the pages that are showing up in the firewall logs (we know this because he shows up in our custom analytics for the web app itself). We have a working relationship with our users and we’re almost certain he’s not based in the US.

I am aware that a HEAD request in itself is nothing malicious and that browsers sometimes make HEAD requests. Does the Java user agent, or lack of a user agent in some cases, make this activity suspicious? I already block empty user agents and Java user agents through the firewall, although I think Cloudflare by default blocks Java as part of its browser integrity checks.

Questions

  1. Is there any reason why these might be legitimate requests that I shouldn’t block? The fact it’s a HEAD request from a Java user agent suggests no, right?

  2. One idea we had is that one of the users is sharing links to these internal URLs via some outside channel, to outsource work or something. Is it possible some kind of scraper or something has picked up these links and is spamming them now? As I say, I was unable to find them publicly indexed.

  3. Is it possible the user we think is connected has some sort of malware on their machine which is picking up their browser activity and then making those requests?

  4. Could the user have some sort of software that is completely innocent which would make Java based HEAD requests like this, based on their web browsing activity?

Any advice as to how I should continue this investigation? Or other thoughts about what these requests are?