I’ve been tasked with hashing arbitrary types in C++, with the caveat that `A == B`

implies `hash(A) == hash(B)`

even if equality of `A`

and `B`

is determined by a custom equality function `==`

.

For simplicity we can assume that `==`

is an equivalence relation.

For example, the expected behavior of the `hash`

function on std::vectors is as follows:

Given

`using namespace std; vector A = vector(); vector B = vector(); `

`A == B`

will be true because `==`

is overloaded for `std::vector`

to mean equality of the underlying data. Correspondingly, `hash(A) == hash(B)`

should also be true.

I can’t simply hash the addresses of `A,B`

as integers because `A == B`

but `hash(&A) != hash(&B)`

in general.

I’ve thought of one solution, but I wonder if its optimal. It seems terribly inefficient. The solution is to build the `hash`

function as new values are hashed:

` using namespace std; <template class Key> class Hasher{ public: unordered_map<pair<Key,Integer>> hashedKeys; int max_hash Hasher(int max_hash){ this->max_hash = max_hash; } int hash(Key key){ // If key has already been hashed, used that hash_value if ( hashedKeys.count(key) == 1){ return hashedKeys[key]; } // For pairs of saved (Key key, int hash_value) for(unordered_map<Key,Integer>::iterator it=hashedKeys.begin(); it!=hashedKeys.end(); it++;){ // If an equal key has been inserted, just use its hash_value if(key == *it){ hashedKeys.insert(key, *it.second); return *it.second; //use hash value of equal Key } } // If no other Keys equal this one, randomly hash it, and save int hash_value = rand() % max_hash; hashedKeys.insert(key, hash_value); return hash_value; } } `

I could do some extra bookkeeping to ensure that inequivalent Keys are less likely to be mapped to the same hash by the random assignment, but that’s largely besides the point.

Ignoring collision resolution, hashing a new value is `O(hashedKeys.size())`

, while hashing a previous hashed value is `O(1)`

We also require `O(n)`

additional space to store the computed hash values, where most hash functions require `O(1)`

.

In a situation where a cache is large and new keys are constantly being inserted, the `O(n)`

search is incredibly inefficient, so I’d prefer another approach if possible, or a proof that improvement is impossible.

Take the class ParityInteger:

`class ParityInteger{ public: int number; ParityInteger(int n){ number = n; } bool operator==(const ParityInteger& other){ return (number % 2) == (other.number % 2); } } `

The ideal `hash`

for such a class is:

`int hash(ParityInteger n){ return n%2; } `

which basically assigns a ParityInteger to a representative of its equivalence class.

Besides my method in the class `Hasher`

, is there any better way to automatically find a function which assigns equivalent members of an arbitrary type to the same integer, without being trivial?

Given a computable equivalence relation `==`

for some type, is there an algorithm to compute a nontrivial function `hash`

such that `==`

is a congruence relation wrt `hash`

.