I am familiar with the concept of hash functions and fingerprints but I am new to the implementation of functions and the specific characteristics of all those hash functions out there.

## What I need

Say I have a table of employee data {FirstName, LastName, Birthday, …} which is **dynamic**. New employees join the the company, old employees leave. I want to hash the employee data. I need to make shure that the data of new employees is not hashed to the same value as any other employee that has **ever** been in the database.

In this sense the data set is append only. While deleting the employee data of an retiring employee is mandatory, I can store the hash, that was once linked to that employee. (Which most likely is of no use because the hash of past employees data will not evaluate to itself once hashed again 🙁 )

Hashing does not need to be cryptographic. Yet hashes must not easily be tracked back to employee data. By this I mean you are not supposed to use an efficient method to calculate employee data from hash values. Since information is lost in the process of hashing I would assume that this requirement is easy to match.

## The Goal

I want to reduce the size of the hash (meaning bit size) as much as possible.

I would assume that, without collision resistance, I need to pick a fairly large hash (2^32 or bigger) to assure a tolerable risk of collision. Avoiding this is the main interest behind the question.

I must guarantee that never ever a new employees data is hashed to the same value as one of the prior employees data was hashed to. I can make assumption like “Given infinite time there will in total never be more then 1.000.000 people in the company or retired from the company.” So the total number space of hashes is fixed.

Is this solvable? If not, what would be the best hashing procedure that assures maximum collision resistance (Rabin’s Fingerprinting?)