I’m reading a sans paper on IOCs (indicators of compromise) in malware forensics and I came across this interesting obstacle:
polymorphic and metamorphic codes (Paxson, 2011) result in multiple hash identities for the same class of malware
Now I understand the existence of IOCs and the frameworks (such as OpenIOC) purpose is to account for this flaw in using hashing as a way of identification. But I’m trying to dig in a little deeper into the way we use hashing, and perhaps create a solution. Unless there’s already a solution in which case that’d be the answer to this question
Is there an alternative to using hashing to identify malware?
My idea is to create a way to hash something that expresses the level of difference between the two, maybe call this a “measured hash,” where the first, middle, or last portion of hash of length x, shows the same values for binaries with the same values. Maybe, by definition, what I’m describing is no longer a hash but it’d still be a program or function that takes a binary and outputs a fixed length representation of that binary for identification purposes. Then if only one small element of the binary is different, we’d be looking at a hash that is very similar to the hash of the original.
Using sha1 hash as an example:
Is the result of this sentence:
I’m stealing all your files using this binary but then I’ll recompile another binary after adding or subtracting a few blocks of code
Now if I change the last three words of this sentence I get:
Which is no surprise for anyone who knows hashing 101. My proposal is to use a mechanism that gets me something like this for the before
CA422BBF6E52040FF0580F7C209F399897020A7A and this for the after:
CA422BBF6E52040FF0580F7C209F399897029B10 because, after all, only three words were deleted and replaced by a single word.
What I’m NOT looking for in an answer, is a list of artifacts or frameworks that are already being used to identify malware. What I would like to know is if such a tool already exists or if my idea is preposterous and wouldn’t be of value to forensic investigators looking to share the intelligence of their research.