What is the expected time complexity of checking equality of two arbitrary strings?

The simple (naive?) answer would be O(n) where n is the length of the shorter string. Because in the worst case you must compare every pair of characters.

So far so good. I think we can all agree that checking equality of two equal length strings requires O(n) runtime.

However many (most?) languages (I’m using Python 3.7) store the lengths of strings to allow for constant time lookups. So in the case of two unequal length strings, you can simply verify len(string_1) != len(string_2) in constant time. You can verify that Python 3 does indeed make this optimization.

Now, if we’re checking the equality of two truly arbitrary strings (of arbitrary length) then it is much more likely (infinitely, I believe) that the strings will be of unequal length than of equal length. Which (statistically) ensures we can nearly always compare them in constant time.

So we can compare two arbitrary strings at O(1) amortized, with a very rare worst-case of O(n). Should we consider strings comparisons then to be O(1) in the same way we consider hash table lookups to be O(1)?

Stack Overflow, and my copy of Cracking the Coding interview cite this operation as O(n).

Cryptoki PKCS11 C_Decrypt returns shorter key (decrypted messaged) than expected

First off let me say I’m fully aware this question can only be answered by the token vendor but I have already contacted them and with the whole COVID situation my hopes of hearing from them soon are not high (nor that I need to, this is just to satisfy my own curiosity). What I’m hoping to get as an answer is maybe somebody with a similar experience.

I have several tokens (smart cards with crypto capabilities based on an MCU from ST Micro., ST19WLxx to be more precise) where I have stored certificates, mostly for authentication and digital signature purposes. But they can also be used for decryption so I decided to give that a try. The original idea was to generate a symmetric key for disk encryption purposes. Then I would use the public key on the card to encrypt it and the private key to decrypt it to unlock access to a partition on my disk.

This is all standard practice and should be pretty straight forward but for completeness, I will guide you through the steps (on Linux; although I also tried with Windows with the same end result):

1) I generated a random symmetric key, 245 bytes long to account for the fact that I will be using RSA-PKCS padding, the only one supported by the card and considering the RSA keys are 2048 bit long:

$   dd if=/dev/urandom of=./symmetric_key bs=1 count=245 

2) I extract the public key from the card, once I got its ID:

$   pkcs11-tool -r --login --id $  KEY_ID --type pubkey --module token_driver.so -o public_key.der 

3) I convert the public key format to pem:

$   openssl rsa -pubin -in pub_key.der -inform DER -outform PEM -out pub_key.pem 

4) I encrypt the symmetric key from step one using my public key:

$   openssl rsautl -inkey ./pub_key.pem -pubin -encrypt -pkcs -in ./symmetryc_key -out ./encrypted_key.pkcs1 

5) And finally, I decrypt with the private key on my card:

$   pkcs11-tool --id $  KEY_ID --decrypt --login -m RSA-PKCS --module token_driver.so --input-file encrypted_key.pkcs1 --output-file decrypted_key 

With that, I should recover the original symmetric_key again on decrypted_key. Unfortunately, this is not what happens. Instead, my decrypted_key is only 102 bytes long.

If I examine the key I can see I’m getting only the last 102 bytes of the original key, the remaining 143 are lost.

This is an example of symmetric_key (sample output from step 1):

00000000  77 1a e4 f3 71 c1 23 c8  0a 47 17 87 d3 c6 ad 31  |w...q.#..G.....1| 00000010  2b 43 94 f9 1f 41 a0 c7  4f 80 5c 00 51 bb 6b b6  |+C...A..O.\.Q.k.| 00000020  a4 4c 87 5b 5c 5c 28 ef  d3 b7 d1 85 a2 3a c0 87  |.L.[\(......:..| 00000030  f1 25 38 b7 b9 28 d7 5f  e4 a1 da 4d 0a 71 f2 85  |.%8..(._...M.q..| 00000040  89 0e bb a4 2b 58 3e 18  90 c6 be 75 22 78 27 d7  |....+X>....u"x'.| 00000050  36 4a 95 74 aa fe e3 c1  d1 f6 02 a0 26 18 28 e2  |6J.t........&.(.| 00000060  14 9c 46 58 ea d1 b6 b6  1f d6 86 f6 9f f7 29 c7  |..FX..........).| 00000070  0e bd 50 8e dd ce 34 65  3f 7a 32 e3 3a 28 4c 3a  |..P...4e?z2.:(L:| 00000080  8d 47 36 9c ab af d0 db  bf d0 db f1 ca 32 be 97  |.G6..........2..| 00000090  62 4e c4 6a 79 b3 1a 3a  2b 2c 11 69 84 9b d5 65  |bN.jy..:+,.i...e| 000000a0  d6 75 b5 00 05 42 c5 8f  cd 82 6a 09 9a 50 07 2b  |.u...B....j..P.+| 000000b0  04 86 0d 15 92 e3 8b cf  fb 97 1c 9e f7 6f 22 51  |.............o"Q| 000000c0  e1 45 00 64 45 3d 4b 38  a6 7f f0 aa 7e 12 bb 26  |.E.dE=K8....~..&| 000000d0  85 91 a4 5c 9e dd 59 6a  f6 85 c2 2b 38 4d 2b c2  |...\..Yj...+8M+.| 000000e0  f1 2f 71 d0 21 46 1b d2  fd 57 03 66 2f b1 c1 0f  |./q.!F...W.f/...| 000000f0  51 53 9d 22 4e                                    |QS."N| 000000f5 

And the corresponding output from decrypting on step 5:

00000000  97 62 4e c4 6a 79 b3 1a  3a 2b 2c 11 69 84 9b d5  |.bN.jy..:+,.i...| 00000010  65 d6 75 b5 00 05 42 c5  8f cd 82 6a 09 9a 50 07  |e.u...B....j..P.| 00000020  2b 04 86 0d 15 92 e3 8b  cf fb 97 1c 9e f7 6f 22  |+.............o"| 00000030  51 e1 45 00 64 45 3d 4b  38 a6 7f f0 aa 7e 12 bb  |Q.E.dE=K8....~..| 00000040  26 85 91 a4 5c 9e dd 59  6a f6 85 c2 2b 38 4d 2b  |&...\..Yj...+8M+| 00000050  c2 f1 2f 71 d0 21 46 1b  d2 fd 57 03 66 2f b1 c1  |../q.!F...W.f/..| 00000060  0f 51 53 9d 22 4e                                 |.QS."N| 00000066 

First thing I thought was: “huh? software/driver issue. But I have access to the driver code and after staring at it and messing with it for quite a long while I am almost completely sure there is nothing wrong with it.

The major clue that makes me think this is a firmware issue (I don’t have access to the code inside the card’s MCU) comes from a very careful examination of the APDU frames that the card exchanges with the host: there are no errors anywhere, I always get the magic 0x9000 everything is fine message from the card and the frame where I receive the decrypted data is short (it’s actually 20 or so bytes longer than 102, but there are headers and a secure channel involved so part of the message is encrypted) and comes announced with the correct number of bytes (SW=0x6179).

I did many more things, like: testing on Windows, trying keys and text messages with different lengths (the decryption works fine up to messages of 102 bytes, longer than that and they get truncated), using different cards with the same hardware and firmware version, using different cards with different hardware and firmware versions (not that dissimilar after all because I got the same problem), getting all debug info from the driver to see if I was getting any hidden errors…

Considering RSA-OAEP is not supported by this card (or at least not documented) and the problems associated with RSA-PKCS I guess it’s best to let this old dog sleep and not try to teach it new tricks.

But as I said: I’m curious, have you ever encounter something like this? Is there something else I can do to be sure this is a firmware issue? I guess in part I refuse to believe something so fundamental has been lurking undetected for so long (this hardware has been in use for many years by a significant amount of people). Maybe there is something wrong with my setup or understanding of the problem after all.

Expected search times with linear vs quadratic probing

Why exactly does quadratic probing lead to a shorter avg. search time than linear probing?

I fully get that linear probing leads to a higher concentration of used slots in the hash table (i.e. higher “clustering” of used consecutive indices). However, it’s not immediately trivial (to me at least) why that translates to higher search times in expectation than in quadratic probing, since in both linear and quadratic probing the first value of the probing sequence determines the rest of the sequence.

I suppose this has to do more with the probability of collisions between different probing sequences. Perhaps different auxiliary hash values are less likely to lead to collisions early in the probing sequence in quadratic than in linear hashing, but I haven’t seen this result derived or formalized.

Worst-case expected running time for Randomized Permutation Algorithm

I have an algorithm which, when given a positive integer N, generates a permutation of the first N integers (from 1 to N) using a method called randInt(x,y). The method randInt(x,y) will generate a random integer between the numbers x and y, provided they are positive integers and y >= x.

The algorithm is given by the following pseudo-code:

1.  if (N <= 0) { 2.     return null 3.  } else { 4.     A := new int[] w/ size N and all cells initialized to 0 5.     a[0] := randInt(1,N) 6.     for (i := 1 to length(A)-1) do  7.        boolean rInA := True 8.        while (rInA) { 9.           rInA := False  10.          int r := randInt(1,N) 11.          for (j := 0 to (i-1)) do  12.             if (r = A[j]) { 13.                rInA := True 14.             } 15.          }    16.       } 17.       A[i] := r 18.    } 19. } 20. return A 

My understanding of the algorithm is as follows:

The outermost for-loop will run N-1 times and for each of those iterations a random number is generated and then compared to all the previous cells of A that have been visited in previous iterations. If any of the those cells contain that randomly generated number then that number cannot be used and a new number is randomly generated (in the next iteration of that nested while-loop). This new randomly generated number is then, like before, compared to all the previously visited cells in A to check for duplication. This continues until randInt(x,y) generates a random number that is not already in the first i cells of A.

This leads me to believe that the Worst-case expected running time of the algorithm is something like: $ \sum_{i=1}^{N-1}(\alpha i)$

Now the $ \alpha$ here represents the effect the while-loop has on the running time and is the point of uncertainty for me. I know that in the first iteration of the outermost for loop its unlikely that randInt will generate the one integer that A already contains (1/N I believe) so that inner-most for-loop is likely to only execute once. However, by the last iteration (of outer-most for-loop) the probability that randInt generates one of the N-1 integers already in A is $ \frac{N-1}{N}$ so because of the while-loop its likely that the inner-most for-loop for that iteration (of the outer-most for-loop) will execute more like n times.

How can I use the probability introduced into the algorithm by randInt to calculate the algorithms run-time?

Stretegy to find the min expected cost on a series graph with edge probability pi and search cost ci

In a series graph, each edge $ e_i$ exists with probability $ p_i$ . And if you want to examine the existence of edge $ e_i$ , it will cost you $ c_i$ . I want to test the connectivity between source $ s$ and destination $ d$ with the minimum expexted cost.

I have figured out that the expected cost can be caculated below if the edge detection sequence is $ e_1, e_2, \cdots e_n$ :

$ $ E(cost) = c_1 + p_1 * (c_2 + p_2 * (c_3 + p_3 * (\cdots (c_{n-1} + p_{n-1} * c_n)\cdots))))$ $

So is there a stretegy or algorithm to find out the minimum expected cost and the edge detection sequence?

enter image description here

Amount of expected loop iterations when searching an array by random index

Lets say we have an array A of size n. It has 1 as its first index and n as its last index. It contains a value x, with x occurring k times in A where 1<=k<=n

If we have a search algorithm like so:

while true:   i := random(1, n)   if A[i] == x     break 

random(a,b) picks a number uniformly from a to b

From this we know that the chances of finding x and terminating the program is k/n with each iteration. However what I would like to know is what would be the expected value for the number of iterations or more specifically the amount of times the array was accessed in this program given the array A as described above.

Iterative-substitution method yields different solution for T(n)=3T(n/8)+n than expected by using master theorem

I’s like to guess the running time of recurrence $ T(n)=3T(n/8)+n$ using iterative-substitution method. Using master theorem, I can verify the running time is $ O(n).$ Using subtitution method however, I arrive at a different answer than expected..

$ T(n)=3T(n/8)+n\ =3T(3T(n/8^2)+n)+n=3^2T(n/8^2)+3n+n\ =3^2(3T(n/8^3)+n)+3n+n=3^3T(n/8^3)+3^2n+3n+n$

We can see the pattern: $ 3^iT(n/8^i)+n*\frac{3^i -1}{2}$

The recursion finishes when $ i=log_8 n$ .

Substituting i into the discovered pattern, $ 3^{log_8 n}T(1)+n*\frac{3^{log_8 n} -1}{2} =\ n^{log_8 3}*c+0.5n*n^{log_8 3} – 0.5n =n^{log_8 3}*c+ 0.5n^{log_8 3+1}-0.5n \in O(n^{1.52})$ .

What am I doing wrong? why is my answer not $ O(n)$ ?

Numenera – Higher than expected Health on NPCs

I’m looking to understand the Health inflation commonly printed in Cypher System material (Often in module adventures, or in the little sidebars when describing setting NPCs)

In Numenera Health (HP) is generally determined by the standard Target Number

Numenera – Discovery, p 222 (Also the same in 1st Edition)

Health: A creature’s target number is usually also its health, which is the amount of damage it can sustain before it is dead or incapacitated. For easy reference, the entries always list a creature’s health, even when it’s the normal amount for a creature of its level.

Which is 3 x the Difficultly level, just for reference.

The designers elude to a caveat that sometimes monsters will just break the usual defined health often for a much higher number. I recall somewhere in 1st Ed Numenera making reference to doing this to provide more challenging combats to higher tier characters.

A brief glossing of Discovery / Destiny I’ve grabbed some examples:

  • Discovery p 367 – Teratoma – Level: 3 HP: 12
  • Discovery p 381 – Octopus- Level: 3 HP: 15
  • Discovery p 369 – Teratoma (M) – Level: 4 HP: 15
  • Destiny p 371 – Assassin – Level: 4 HP: 20
  • Discovery p 375 – Weymel – Level: 5 HP: 20
  • Discovery p 385 – Latos – Level: 5 HP: 25
  • Destiny p 389 – Halcus – Level: 5 HP: 20
  • Destiny p 389 – Drayva – Level: 5 HP: 20
  • Destiny p 362 – Khagun Semper – Level: 5 HP: 26
  • Destiny p 373 – Soludi – Level: 6 HP: 24
  • Destiny p 398 – Heri – Level: 6 HP: 27
  • Destiny p 398 – Scrose – Level: 7 HP: 30

There are many, many more examples spread through out Cypher Systems, OG-Numenera, Discovery, Destiny, The Strange, and Predation. And they are not one offs or used liberally, HP inflation is extremely common. As you can seen from just this small list here creatures range from Boss encounter to lowly random animals with no rhyme or reason I can perceive. Across all level ranges.

My question is Why? Is there any systematic process for doing this? Is the standard HP suggested in the Creature section just too low? I’m looking for any notes from the designer, or even personal GM experience to help gauge what is the appropriate amount of HP one should be assigning to combatants.

Programming Test for a job in Game Dev – expected levels of documentation etc

If this is the wrong place to be asking this – please let me know and I’ll happily ask it somewhere else!

I am completing a C++ Proficiency test for a “Junior Engine Programmer” role at a game studio in the UK. The test involves creating a pathfinding demo and rendering it to the screen. I won’t go too into the details of the test, but the brief doesn’t mention any documentation or unit testing etc.

I’ve been told by a lecturer I should definitely include both of those, despite not being asked, and by another that I should use my time more wisely to make a great implementation. What is the done thing here? The only thing close that the brief mentions is making clear where I’ve used other libraries.

Is there anything else I should consider submitting with the implementation as well? Thinking of technical specification such as class diagrams etc, or anything really.

Thanks!