Checking equality of integers: O(1) in C but O(log n) in Python 3?

Consider these equivalent functions in C and Python 3. Most devs would immediately claim both are O(1).

def is_equal(a: int, b: int) -> bool:   return a == b 
int is_equal(int a, int b) {   return a == b } 

But consider what is happening under the surface. Integers are just binary strings and, to determine equality, both languages will compare the strings bit-by-bit. In either case this scan is O(b) where b is the number of bits. Since integers have a constant size in bits in C, this is simply O(1).

In Python 3 however, integers do not have fixed size and the scan remains O(b) for the number of bits in the input, or O(log a) where a is the value of the input in base 10.

So if you’re analyzing code in Python, any time you compare two integers, you are embarking on a surprisingly complex journey of O(log n) with respect to the base 10 value of either number.

For me this raises several questions:

  1. Is this correct? I haven’t seen anyone else claim that Python compares ints in log time.
  2. In the context of conducting an interview, should you notice or care if a candidate calls this O(1)?
  3. Should you notice or care about this distinction in the real world?

What is the expected time complexity of checking equality of two arbitrary strings?

The simple (naive?) answer would be O(n) where n is the length of the shorter string. Because in the worst case you must compare every pair of characters.

So far so good. I think we can all agree that checking equality of two equal length strings requires O(n) runtime.

However many (most?) languages (I’m using Python 3.7) store the lengths of strings to allow for constant time lookups. So in the case of two unequal length strings, you can simply verify len(string_1) != len(string_2) in constant time. You can verify that Python 3 does indeed make this optimization.

Now, if we’re checking the equality of two truly arbitrary strings (of arbitrary length) then it is much more likely (infinitely, I believe) that the strings will be of unequal length than of equal length. Which (statistically) ensures we can nearly always compare them in constant time.

So we can compare two arbitrary strings at O(1) amortized, with a very rare worst-case of O(n). Should we consider strings comparisons then to be O(1) in the same way we consider hash table lookups to be O(1)?

Stack Overflow, and my copy of Cracking the Coding interview cite this operation as O(n).

Decidability of equality, and soundness of expressions involving elementary arithmetic and exponentials

Let’s have expressions that are composed of elements of $ \mathbb N$ and a limited set of binary operations {$ +,\times,-,/$ } and functions {$ \exp, \ln$ }. The expressions are always well-formed and form finite trees, with numbers as leaf nodes and operators as internal nodes, binary operations having two child sub-expressions and the functions one. A value of such an expression is interpreted to mean some number in $ \mathbb R$ .

There are two limitations on the structure of the expressions: the divisor (the right-hand sub-expression) of $ /$ can’t be 0 and the argument of $ \ln$ must be positive.

I have two questions about these kind of expressions:

  • Is it possible to ensure “soundness” of such an expression, in the sense that the two limitations can be checked in limited time?

  • is an equality check between two such expressions decidable?

These questions seem to be connected in the sense that if you’re able to check equality of the relevant sub-expression to zero, you can decide whether a division parent expression is sound, and it doesn’t seem hard to check whether a sub-expression of $ \ln$ is positive or negative if it’s known not to be zero.

I know that equality in $ \mathbb R$ is generally not decidable, whereas equality in algebraic numbers is. However, I wonder how inclusion of {$ \exp, \ln$ } changes the result.

(A side note: I posted an earlier version of a question with similar intent here, but it turned out to have too little thought behind it, and had unnecessary (= not related to the meat of the question) complications with negative logarithms.)

Is it possible to generate an equality function based on the data to be compared? [closed]

Two Booleans are equal if the’re the same value, two numbers similarly. Two sets are equal if they have the same elements. In case of checking two sets for equality we can use the following scheme/racket function:

(define (same-set? l1 l2)   (and (subset? l1 l2) (subset? l2 l1))) 

How would such a function be generated automatically?

The basic properties of an equivalence relation are:

Substitution property: For any quantities a and b and any expression F(x), if a = b, then F(a) = F(b) (if both sides make sense, i.e. are well-formed). Some specific examples of this are:

For any real numbers a, b, and c, if a = b, then a + c = b + c (here F(x) is x + c);

For any real numbers a, b, and c, if a = b, then a − c = b − c (here F(x) is x − c);

For any real numbers a, b, and c, if a = b, then ac = bc (here F(x) is xc);

For any real numbers a, b, and c, if a = b and c is not zero, then a/c = b/c (here F(x) is x/c).

Reflexive property: For any quantity a, a = a. Symmetric property: For any quantities a and b, if a = b, then b = a. Transitive property: For any quantities a, b, and c, if a = b and b = c, then a = c.

Is it possible to generate a function that obeys the above properties? Would that be enough? Could knowing the type of data help?

If you have any ideas on how to improve this question or tag it please comment.

Decidability of equality of expressions involving exponentiation

Let’s have expressions that are finite-sized trees, with elements of $ \mathbb N$ as leaf nodes and operators and the operations {$ +,\times,-,/$ , ^} with their usual semantics as the internal nodes, with the special note that we allow arbitrary expressions as the right-hand side of the exponentiation operation. Are the equality between such nodes (or, equivalently, comparison to zero) decidable? Is the closure under these operations a subset of algebraic numbers or not?

This question is similar to this: Decidability of Equality of Radical Expressions but with the difference that here the exponentiation operator is symmetric in the type of the base and the exponent. That means that we could have exponents such as $ 3^\sqrt 2$ . It isn’t clear to me, whether allowing exponentiation with irrationals retains the algebraic closure.

This question is also similar to Computability of equality to zero for a simple language but the answers to that question focused on the transcendental properties of combinations of $ \pi$ and $ e$ , which I consider out of scope here.

Equality for floats

Suppose I have two floats (e.g. {x,y} = RandomReal[{0,1},2]). Might there be a situation where all three comparisons evaluate to False (i.e. x > y, x < y and x == y)? I am thinking that the values might not be that stable and the equality might fail. Should something like Around be used instead? I am not fully understanding what is going on behind the hood with == operator and how stable the floats are and how stably they can be carried over.

How to check equality between any two 2-dimensional arrays (matrices) with time complexity less than O(n^2)?

Suppose, there are three matrices – A[3][2] : {{1,2},{3,4},{5,6}} ; B[3][2] : {{1,2},{3,4},{5,6}} ; C[3][2] : {{2,1},{3,4},{5,6}} . Here A=B , A!=C and B!=C .

So, task is to check equality between all the possible combinations of 2-dimensional matrices from a given set of matrices with time complexity less than O(n^2) .Here the possible combinations are (A,B) , (A,C) and (B,C).

( As part of the solution , can we represent 2-dimensional matrices with a certain value/string/array which will reduce the time complexity of equality comparison ? )

$\beta$ reduction equational equality

In Categories, Types and Structures, authors talk about exponential objects in section 2.3.1.

Let $ C$ be a Cartesian category, and $ a,b \in Ob_C$ . The exponent of $ a$ and $ b$ is an object $ b^a$ together with a morphism $ eval_{alb}: b^a \times a\rightarrow b$ (evaluation map), and for every object $ c$ an operation $ \Lambda_c : C[c\times a,b]\rightarrow C[c, b^a]$ such that for all morphisms $ f: c\times a\rightarrow b$ , $ h:c\rightarrow b^a$ , the following equation holds:
$ \beta. eval_{a,b} \circ (\Lambda (f)\times id_a )=f$
$ \eta. \Lambda_c(eval_{a,b}\circ(h\times id_a))=h$

Can you please show me the steps to derive the last two equations?

How do you find a hash function that respects a custom equality function?

I’ve been tasked with hashing arbitrary types in C++, with the caveat that A == B implies hash(A) == hash(B) even if equality of A and B is determined by a custom equality function ==.

For simplicity we can assume that == is an equivalence relation.

For example, the expected behavior of the hash function on std::vectors is as follows:


using namespace std; vector A = vector(); vector B = vector();  

A == B will be true because == is overloaded for std::vector to mean equality of the underlying data. Correspondingly, hash(A) == hash(B) should also be true.

I can’t simply hash the addresses of A,Bas integers because A == B but hash(&A) != hash(&B) in general.

I’ve thought of one solution, but I wonder if its optimal. It seems terribly inefficient. The solution is to build the hash function as new values are hashed:

 using namespace std;  <template class Key> class Hasher{      public:         unordered_map<pair<Key,Integer>> hashedKeys;        int max_hash         Hasher(int max_hash){           this->max_hash = max_hash;        }         int hash(Key key){            // If key has already been hashed, used that hash_value           if ( hashedKeys.count(key) == 1){                return hashedKeys[key];           }            // For pairs of saved (Key key, int hash_value)           for(unordered_map<Key,Integer>::iterator it=hashedKeys.begin(); it!=hashedKeys.end(); it++;){               // If an equal key has been inserted, just use its hash_value              if(key == *it){                 hashedKeys.insert(key, *it.second);                 return *it.second; //use hash value of equal Key              }           }            // If no other Keys equal this one, randomly hash it, and save           int hash_value = rand() % max_hash;           hashedKeys.insert(key, hash_value);           return hash_value;         } } 

I could do some extra bookkeeping to ensure that inequivalent Keys are less likely to be mapped to the same hash by the random assignment, but that’s largely besides the point.

Ignoring collision resolution, hashing a new value is O(hashedKeys.size()), while hashing a previous hashed value is O(1) We also require O(n) additional space to store the computed hash values, where most hash functions require O(1).

In a situation where a cache is large and new keys are constantly being inserted, the O(n) search is incredibly inefficient, so I’d prefer another approach if possible, or a proof that improvement is impossible.

Take the class ParityInteger:

class ParityInteger{    public:       int number;        ParityInteger(int n){          number = n;       }        bool operator==(const ParityInteger& other){          return (number % 2) == (other.number % 2);       } }  

The ideal hash for such a class is:

int hash(ParityInteger n){    return n%2; } 

which basically assigns a ParityInteger to a representative of its equivalence class.

Besides my method in the class Hasher, is there any better way to automatically find a function which assigns equivalent members of an arbitrary type to the same integer, without being trivial?

Given a computable equivalence relation == for some type, is there an algorithm to compute a nontrivial function hash such that == is a congruence relation wrt hash.