Checking equality of integers: O(1) in C but O(log n) in Python 3?

Consider these equivalent functions in C and Python 3. Most devs would immediately claim both are O(1).

def is_equal(a: int, b: int) -> bool:   return a == b 
int is_equal(int a, int b) {   return a == b } 

But consider what is happening under the surface. Integers are just binary strings and, to determine equality, both languages will compare the strings bit-by-bit. In either case this scan is O(b) where b is the number of bits. Since integers have a constant size in bits in C, this is simply O(1).

In Python 3 however, integers do not have fixed size and the scan remains O(b) for the number of bits in the input, or O(log a) where a is the value of the input in base 10.

So if you’re analyzing code in Python, any time you compare two integers, you are embarking on a surprisingly complex journey of O(log n) with respect to the base 10 value of either number.

For me this raises several questions:

  1. Is this correct? I haven’t seen anyone else claim that Python compares ints in log time.
  2. In the context of conducting an interview, should you notice or care if a candidate calls this O(1)?
  3. Should you notice or care about this distinction in the real world?

What is the expected time complexity of checking equality of two arbitrary strings?

The simple (naive?) answer would be O(n) where n is the length of the shorter string. Because in the worst case you must compare every pair of characters.

So far so good. I think we can all agree that checking equality of two equal length strings requires O(n) runtime.

However many (most?) languages (I’m using Python 3.7) store the lengths of strings to allow for constant time lookups. So in the case of two unequal length strings, you can simply verify len(string_1) != len(string_2) in constant time. You can verify that Python 3 does indeed make this optimization.

Now, if we’re checking the equality of two truly arbitrary strings (of arbitrary length) then it is much more likely (infinitely, I believe) that the strings will be of unequal length than of equal length. Which (statistically) ensures we can nearly always compare them in constant time.

So we can compare two arbitrary strings at O(1) amortized, with a very rare worst-case of O(n). Should we consider strings comparisons then to be O(1) in the same way we consider hash table lookups to be O(1)?

Stack Overflow, and my copy of Cracking the Coding interview cite this operation as O(n).

checking whether turing machine passes at least k>2 states before accepting a word

$ L=\{<M,k>|\exists\,\,w\in L(M)\,\,such\,\,that\,\,M\,\,passes\,\,at\,\,least\,\,k>2\,\,distinct\,\,states\,\,before\,\,accepting\,\,w\}$

I try to think of reduction to prove that this language is neither RE nor coRE. How to approach this problem? Is there a hint, or intuition?

I usually check whether Rice can be used, but the question here is not about the language itself

checking configuration history of Turing machine using PDA

I am trying to understand the technique of using configuration history in proofs.

To prove that: $ \{<M>|M\,\,\,is\,\,\,a\,\,\,TM\,\,\,and\,\,\,L(M)=\sum^* \}\notin RE$

given $ <M,w>$ we have built a Turing machine that accepts all words except M accepting configuration on w. (and then simple reduction)

To prove that: $ \{<P>|P is\,\,\, a\,\,\, PDA\,\,\, and\,\,\, L(P)=\sum^*\}\notin RE$

we showed the same proof, only that we built a PDA that accepts all the words except the accepting configuration of M on w.

Does PDA’s ability to determine whether input is an accepting configuration of M on w actually means that I can simulate M’s run on w with a PDA? Or testing whether configuration is an accepting configuration is different from a simulation

Checking system integrity after clicking scam email link on Linux

Earlier today, my mother opened an email thinking it was from my sister-inlaw, then clicked on the shortened link.

The link loaded a page of fairly nonsensical text. The source of the page had no explicit javascript code, but the text was formatted with a non-standard identifier. NoScript told me there were scripts on the page (but they weren’t trusted, so should have been blocked).

The system is Fedora 32, upgraded to this release a few days ago. The browser is Firefox with NoScript installed.

What should I do to confirm the integrity of the system? I’m concerned about the integrity of the Linux system, of course, but I’d also be concerned about any possible transfer of viruses or malware to Windows users my mother emails.

I’ve done this once before after a similar incident. I’m planning to create a live USB to scan the system for problems (which I did before) using one of the forensic Linux distros designed for this, but I’m pretty sure I did something else last time and can’t remember what.

What is necessary to ensure the system’s integrity?

I’ve read Clicked link in faked email and https://security.stackexchange.com/a/17854. https://security.stackexchange.com/a/73660 is fairly scary, though I’m somewhat sceptical of its claims.

I also read various online guides, but had difficulty finding anything specific to Linux and reasonably current. (I think I must be searching the wrong terms because I seem to remember finding this kind of information fairly easily before.)

Checking if two statements can be reached in one control flow

Assume I have a graph representing the control flow and the call graph of a given program. I also have a first and a second statement. I now want to figure out if it is possible to execute both statements (in order) within the same program execution.

Control Flow Graph: I have a graph with all the statements of the program and edges connecting the statements determining the control flow of the program intra function (i.e., within a function).

Call Graph: I also have edges connecting any function call with the start of the function control flow of the called function.

The literature I found concerning control flow covers only intra function flow analysis and the only correct approach I can come up with is a depth first (or breadth first) search starting from the first statement. This, however, hardly feels correct as it is quite cumbersome and I would expect a better solution.

Ray-Sphere hit checking

I’m making a raytracer following this article.
There is a code of checking for ray-sphere intersection (page #11)
Could you explain next things:

  1. Why dot((p - c) ,(p - c)) = R*R. What does dot(a, a), i.e. between the same vectors do? So, if they’re the same vectors, then angle between them will be always 0 and cos(theta) will be always 1 and dot(a, a) will give squared length of vector a?
  2. I don’t understand how he’d made a quadratic equation at all

Automaton-based model checking on finite traces

I want to check whether a formula in finite LTL is valid on a finite, linear trace.

For infite traces I would create a Kripke structure of the trace and a B├╝chi automaton for the negated formula, and check if the intersection is empty. Would this also work with a finite trace and formula in FLTL? I already tried adding another atomic proposition “alive” to the Kripke structure and automaton (like here https://spot.lrde.epita.fr/tut12.html). But how could I do it without this additional atomic proposition?

Offloading TLS client-cert checking to OpenSSL (or similar) if server does not support it


TL;DR

I want to have “some thing” to handle client-certs on behalf a server that is unable to do it, for secure user authentication in addition to regular TLS encryption.

Context

In this question How can I double check security against a TLS on a public IP? the answers set clear that regular TLS does not typically do client authentication, although it seems it would be possible if the server requests it.

Let’s suppose I have a server that is able to communicate via “plain text” or “on a TLS channel” (I can re-start the server with or without TLS), but if TLS is enabled, the server does not support checking client-certificates for auth.

The original question was for a docker registry, but I generalize the question to any server supporting TLS but not client-side certs.

What I’m thinking

I am thinking of offloading the “TLS part” to a security-specific software (much similar to what SSH port-forwarding tunnels are) and decouple the server and the TLS handling.

Probably there would be 2 processes involved:

  • The server listens in a firewalled localhost port or a linux socket in “plain text”, but as it is firewalled it can never be reached from the outside.
  • Some kind of “security middleware” (Probably OpenSSL, but I’m not sure) -I think it’d be called a TLS terminator, but I’m neither really sure, too- to do this:
    • Handle the public-IP listening
    • Handle the server-side certs to secure the channel via TLS
    • Handle the client-side certs to check authenticity (probably against a set of public keys I’ll have previously uploaded in the server)
    • If and only if the client belongs to a white-list of users, then forward the decrypted channel to the regualar plain-text server.

Questions

  1. Would be this TLS offloading a normal setup?

  2. If so, is OpenSSL a good handler for this offloading?

  3. If yes, what documentation could be a good starting point for this kind of setup, where I can read on and learn?

Thnx.