How many “compressible” strings are there?

Let’s say that a string of length $ N$ is “compressible” iff its Kolmogorov complexity is less than $ N$ . To keep it simple, we can assume binary strings for this.

It is easy to see that almost all binary strings of length $ N$ are incompressible by using the pigeonhole principle.

So my question is, how many strings of length $ N$ are compressible?

In particular, let’s assume that $ K(S)$ is the Kolmogorov complexity of binary string $ S$ , which is of length $ N$ . Then I have the following three questions:

  1. Of the $ 2^N$ binary strings $ S$ of length $ N$ , how many have $ K(S) < N$ ?
  2. Of the $ 2^N$ binary strings $ S$ of length $ N$ , how many have $ K(S) \leq N/2$ ?
  3. Of the $ 2^N$ binary strings $ S$ of length $ N$ , how many have $ K(S) \leq \log N$ ?

All of the above are for sufficiently large $ N$ .

Is there an alternative to using hashing to identify malware?

I’m reading a sans paper on IOCs (indicators of compromise) in malware forensics and I came across this interesting obstacle:

polymorphic and metamorphic codes (Paxson, 2011) result in multiple hash identities for the same class of malware

Now I understand the existence of IOCs and the frameworks (such as OpenIOC) purpose is to account for this flaw in using hashing as a way of identification. But I’m trying to dig in a little deeper into the way we use hashing, and perhaps create a solution. Unless there’s already a solution in which case that’d be the answer to this question

Is there an alternative to using hashing to identify malware?

My idea is to create a way to hash something that expresses the level of difference between the two, maybe call this a “measured hash,” where the first, middle, or last portion of hash of length x, shows the same values for binaries with the same values. Maybe, by definition, what I’m describing is no longer a hash but it’d still be a program or function that takes a binary and outputs a fixed length representation of that binary for identification purposes. Then if only one small element of the binary is different, we’d be looking at a hash that is very similar to the hash of the original.

Using sha1 hash as an example: CA422BBF6E52040FF0580F7C209F399897020A7A

Is the result of this sentence:

I’m stealing all your files using this binary but then I’ll recompile another binary after adding or subtracting a few blocks of code

Now if I change the last three words of this sentence I get: F5BB055C7F7E76275C6F0528D2ACD6F288CE7496

Which is no surprise for anyone who knows hashing 101. My proposal is to use a mechanism that gets me something like this for the before CA422BBF6E52040FF0580F7C209F399897020A7A and this for the after: CA422BBF6E52040FF0580F7C209F399897029B10 because, after all, only three words were deleted and replaced by a single word.

What I’m NOT looking for in an answer, is a list of artifacts or frameworks that are already being used to identify malware. What I would like to know is if such a tool already exists or if my idea is preposterous and wouldn’t be of value to forensic investigators looking to share the intelligence of their research.

Is there a publicly available source/ list of blacklisted emails or emails associated with malicious activity?

I want to build a simple tool that just checks if an email or email domain is risky or not.

Is there a publicly available list of risky emails that I can do a simple cross reference on?

For example, check if the email person@dangerous-hackers.com or the domain dangerous-hackers.com is on one of these lists.

Any ideas?

Is there a rule for armed fighter against unarmed fighter?

Context:

A player is engaged in a fight with a unarmed peasant in a tavern. The player had a hand weapon.

There is rules about fight between two fighter both unarmed, but I don’t know what happen in case of one fighter having a weapon.

Actually I give the fighter with the weapon an advantage token*, but only because I don’t find rules about this case and want to make it fair.

* A token that add +10% to WS roll

Is there a realistically implementable algorithm for testing the termination of a given petri net?

I am trying to implement this petri net simulator. Amongst it’s specifications it has to return a map of reachable markings from the current state. I don’t really want it to give me an OutOfMemoryError or something if I’m given a petri net with infinite reachable markings. Can this be resolved more elegantly?

In my case the net can have inputs and outputs of any natural value, as well as reset and inhibitor arcs.

Is there anything like a standard GUID to identify a PC?

I am asked about my opinion in a case as follows:

Someone visited a (totally legal, in fact US government) website A and identified themselves. At a very different point in time they – allegedly – visited a (doubtlessly very) illegal website B.

US law enforcement claims there is no doubt that the access to B was by the same person/from the same PC as the access to A. If the identification were based on the client’s IPv4 address (outside the US!), say, I’d argue that these are typically reassigned to new client’s every few hours or days (not to mention shared/NATed use by multiple entities, including WiFi guests), hence is at most very weak evidence. In addition, it currently seems that the non-US ISP was not asked to reveal the identity of their customer associated with the IP in question at the point of time in question. Rather the claim of identity is by comparison with said access to A. Meanwhile, it seems that the identification is not claimed to be done by IPv4 address, but rather by something referenced as a “GUID” identifying the PC. I am not aware of a standard or wide-spread use of any such GUID in any internet protocol that would allow cross-site identification between sites that do not even wish to collaborate on such an issue.

Note that the term GUID was specifically mentioned, i.e., we are not talking about browser fingerprinting or cookies.

Q: Is there anything “GUID-like” that can act as described to identify a PC/device across multiple unrelated(!) sites? In TCP? In http? In TLS? “Anywhere else” in the process?

Is there an official treasure generation method to limit magic item rolls based on dungeon level or some other factor?

I’m running an AD&D campaign for a party of usually-three PCs, who were first level until our most recent session. (As for what they are now, we’ll get to that…) I have the 1e DMG (door cover) and Unearthed Arcana, and a Monster Manual that might be older than that, judging by its condition. The players are using the 2e PHB; these are all inherited books, and the previous owner only ever DM’d in 1e and PC’d in 2e.

My issue is with treasure generation– I’ve been using the standard dungeon generation tables from the DMG, and it works well except for the outcome of treasure rolls. Specifically, magic items don’t seem to be segregated by dungeon level. That first-level party happened upon a Mirror of Mental Prowess, which had some fairly powerful effects but nothing game-breaking, and was worth five thousand experience. Divided among the party, this alone was enough to bring the priest and rogue to second level. Combined with the remainder of the treasure, those two reached level three, and the ranger reached level two.

Now building a dungeon for a later adventure, another magic item roll came up, resulting in… a Ring of Three Wishes. I simply vetoed that and re-rolled, getting something more reasonable this time, but now the question is in my mind of whether this is actually correct.

So, the simple version of the question:
Is there a method in AD&D to limit magic item rolls for treasure based on dungeon level or some other factor, or does this need to be created manually by the DM?


Note that this is not the same question as “What can I do when I accidentally gave out an overpowered item?” This relates purely to the RAW methods for generating magical treasures.