What is the expected time complexity of checking equality of two arbitrary strings?


The simple (naive?) answer would be O(n) where n is the length of the shorter string. Because in the worst case you must compare every pair of characters.

So far so good. I think we can all agree that checking equality of two equal length strings requires O(n) runtime.

However many (most?) languages (I’m using Python 3.7) store the lengths of strings to allow for constant time lookups. So in the case of two unequal length strings, you can simply verify len(string_1) != len(string_2) in constant time. You can verify that Python 3 does indeed make this optimization.

Now, if we’re checking the equality of two truly arbitrary strings (of arbitrary length) then it is much more likely (infinitely, I believe) that the strings will be of unequal length than of equal length. Which (statistically) ensures we can nearly always compare them in constant time.

So we can compare two arbitrary strings at O(1) amortized, with a very rare worst-case of O(n). Should we consider strings comparisons then to be O(1) in the same way we consider hash table lookups to be O(1)?

Stack Overflow, and my copy of Cracking the Coding interview cite this operation as O(n).