How hard would it be to state P vs. NP in a proof assistant?

GJ Woeginger lists 116 invalid proofs of P vs. NP problem. Scott Aaronson published "Eight Signs A Claimed P≠NP Proof Is Wrong" to reduce hype each time someone attempts to settle P vs. P. Some researchers even refuse to proof-read papers settling the "P versus NP" question.

I have 3 related questions:

  1. Why are people not using proof assistants that could verify whether a proof of P vs. NP is correct?
  2. How hard or how much effort would it be to state P vs. NP in a proof assistant in the first place?
  3. Is there currently any software that would be at least in principle capable of verifying a P vs. NP proof?

Proof of provenance

Say Alice produces a bit of intellectual property, let’s call it X. Alice is all for the freedom of information, so publishes X, without charge, electronically (e.g., on her website), asserting her copyright.

Bob comes along and finds X. He is so captivated by the content of which, he is moved to jealousy: He republishes X verbatim, again freely, but instead asserts his own copyright, without any mention of Alice. Let’s call this X’ (i.e., X’ is Bob’s plagiarism of X).

Charlie is interested in the topics discussed in X (and X’) and, during a routine search, she comes across X’. Let’s say Charlie is a "good actor" and performs due diligence and, for whatever reason, fails to uncover X. Charlie is satisfied and cites Bob and X’ in his own published IP, Y.

Y becomes well known and Alice is made aware of it. Through this, she is shocked to discover X’. How can she prove herself to be the rightful copyright holder?


Let’s assume, for the sake of this question, there are no metadata or artefacts (logs, drafts, version control, etc.) associated with the creation of X’, that could be used to corroborate Alice’s story. Moreover, any of the following could also be true:

  • Bob has predated his copyright, relative to Alice.
  • Bob has perfectly fabricated evidence of the creation of X’ by himself.
  • Alice independently verified the publication of X, but the verification service is not all-encompassing (i.e., it can’t guarantee that X’ wasn’t created first, even if X was verified at a particular date).

Proof that $L=\{a^ncb^n| n \ in \mathbb{N}\}$ is not regular

Proof that $ L=\{a^ncb^n| n \in \mathbb{N}\}$ is not regular.

Here is my try, I would really appreciate if someone could tell me if this is a correct proof.

Lets assume L is regular. Then we know that L must meet the requirements of the pumping lemma. So let p the pumping number.

Let $ w=a^pcb^p$ . $ w$ is obviously of the length p and is in L. Therefore it should be possible to split w into three pieces xyz such that $ |y|>0,|xy|<=p,xy^iz$ is in L $ \forall i \in N$ . Because $ |xy|<=p$ $ y $ can only contain the symbol $ a$ (If $ y$ would contain a symbol different from a it would implicate that $ |xy|>p$ , which is not possible). Therefore $ y$ must be in the form $ y=a^{p-k},k<=0<p$ . So the word w equals $ w=a^ka^{p-k}cb^p$ , if we set i=2 we get $ a^ka^{2p-2k}=a^{2p-k},k<p$ and because $ a^{2p-k},k<p\neq b^p$ it follows that the pumped $ w$ is not in $ L$ . Which is a contradiction. Therefore L is not regular.

$ q.e.d$

Proof of the undecidability of compiler code optimization

While reading Compilers by Alfred Aho, I came across this statement:

The problem of generating the optimal target code from a source program is undecidable in general.

The Wikipedia entry on optimizing compilers reiterates the same without a proof.

Here’s my question: Is there a proof (formal or informal) of why this statement is true? If so, please provide it.

Difficulty in few steps in proof of “Amortized cost of $\text{Find-Set}$ operation is $\Theta(\alpha(n))$”assuming union by rank, path compression

I was reading the section of data structures for disjoint sets from the text CLRS I faced difficulty in understanding few steps in the proof of the lemma as given in the question title. Here we assume we follow union by rank and path compression heuristics. Before we move into our target lemma a few definitions and lemma is required as a prerequisites for the target lemma.


The prerequisites:

$ $ level(x)=\max\{k:rank[p[x]]\geq A_k(rank[x])\}$ $ $ $ iter(x)=\max\{i:rank[p[x]]\geq A_{level(x)}^{(i)}(rank[x])\}$ $ $ $ \phi_q(x) = \begin{cases} \alpha(n).rank[x] &\quad\text{if $ x$ is a root or $ rank[x]=0$ }\ (\alpha(n)-level(x)).rank[x]-iter(x) &\quad\text{if $ x$ is not a root and $ rank[x]\geq1$ }\ \end{cases}$ $

Lemma 21.9: Let $ x$ be a node that is not a root, and suppose that the $ q$ th operation is either a $ \text{Link}$ or $ \text{Find-Set}$ . Then after the $ q$ th operation, $ \phi_q(х) \leq \phi_{q-1}(х)$ . Moreover, if $ rank[x] \geq 1$ and either $ level(x)$ or $ iter(x)$ changes due to the $ q$ th operation, then $ \phi_q(х) < \phi_{q-1}(х) – 1$ . That is, $ x$ ‘s potential cannot increase, and if it has positive rank and either $ level(x)$ or $ iter(x)$ changes, then $ x$ ‘s potential drops by at least $ 1$ .


Now in the proof below I marks the steps where I face problem

Lemma 21.12: The amortized cost of each $ \text{Find-Set}$ operation is $ \Theta(\alpha(n))$ .

Proof: Suppose that the $ q$ th operation is a $ \text{Find-Set}$ and that the find path contains $ s$ nodes. The actual cost of the $ \text{Find-Set}$ operation is $ O(s)$ . We shall show that no node’s potential increases due to the $ \text{Find-Set}$ and that at least $ \max\{0,s – (\alpha(n) + 2)\}$ nodes on the find path have their potential decrease by at least $ 1$ .

To see that no node’s potential increases, we first appeal to Lemma 21.9 for all nodes other than the root. If $ x$ is the root, then its potential is $ \alpha(n) . rank[x]$ , which does not change.

Now we show that at least $ \max\{0,s – (\alpha(n) + 2)\}$ nodes have their potential decrease by at least $ 1$ . Let $ x$ be a node on the find path such that $ rank[x] > 0$ and $ x$ is followed somewhere on the find path by another node $ у$ that is not a root, where $ level(y) = level(x)$ just before the $ \text{Find-Set}$ operation. (Node $ у$ need not immediately follow $ x$ on the find path.) $ \require{color}\colorbox{yellow}{All but at most $ \alpha(n) + 2$ nodes on the find path satisfy these constraints on $ x$ .}$ $ \require{color}\colorbox{yellow}{Those that do not satisfy them are the firstnode on the find path (if it has rank $ 0$ ),}$ $ \require{color}\colorbox{yellow}{ the last node on the path (i.e., the root), and the last node $ w$ on the path for which}$ $ \require{color}\colorbox{yellow}{ $ level(w) = k$ , for each $ k = 0,1,2,…, \alpha(n) – 1$ .}$

Let us fix such a node $ x$ , and we shall show that $ x$ ‘s potential decreases by at least $ 1$ . Let $ k = level(x) = level(y)$ . Just prior to the path compression caused by the $ \text{Find-Set}$ , we have

$ rank[p[x]] \geq A_k^{(iter(x)}(rank[x])$ (by definition of $ iter(x)$ ) ,

$ rank[p[y]] \geq A_k(rank[y])$ (by definition of $ level(y)$ ,

$ rank[y] > rank[p[x]]$ (by Corollary 21.5 and because $ у$ follows $ x$ on the find path)

Putting these inequalities together and letting $ i$ be the value of $ iter(x)$ before path compression, we have

$ rank[p[y]] \geq A_k(rank[y]) \geq A_k(rank[p[x]])$ (because $ A_k(j)$ is strictly increasing) $ > A_k(A_k^{(iter(x)}(rank[x])) = A_k^{(i+1)}(rank[x])$ .

Because path compression will make $ x$ and $ у$ have the same parent, we know that after path compression, $ rank[p[x]] = rank[p[y]]$ and that the path compression does not decrease $ rank[p[y]]$ . Since $ rank[x]$ does not change, after path compression we have that $ \require{color}\colorbox{pink}{$ rank[p[x]]\geq A_k^{(i+1)}(rank[x])$ . Thus, path compression will cause either $ iter(x)$ to }$ $ \require{color}\colorbox{pink}{increase (to atleast $ i + 1$ ) or $ level(x)$ to increase (which occurs if $ iter(x)$ increases}$ $ \require{color}\colorbox{pink}{to at least $ rank[x] + 1$ ). In either case,by Lemma 21.9, we have $ \phi_q(х) \leq \phi_{q-1}(х) – 1$ .}$ $ \require{color}\colorbox{pink}{Hence, $ x$ ‘s potential decreases by at least $ 1$ .}$

The amortized cost of the $ \text{Find-Set}$ operation is the actual cost plus the change in potential. The actual cost is $ O(s)$ , and we have shown that the total potential decreases by at least $ \max\{0,s – (\alpha(n) + 2)\}$ . The amortized cost, therefore, is at most $ O(s) — (s — (\alpha(n) + 2)) = O(s) — s + 0(\alpha(n)) = O(\alpha(n))$ , since we can scale up the units of potential to dominate the constant hidden in $ О (s)$ . ■


In the proof above I could not get the mathematics behind the statements highlighted in yellow and pink. Can anyone help me out?

Proof: is the language $L_1$$=${$|$$\emptyset \subseteq L(M)$} (un)-decidable?

I want to show that $ L_1$ $ =$ {$ <M>|$ $ \emptyset \subseteq L(M)$ } is decidable/undecidable – without rice theorem (just for the case that I can apply it).

Every language contain the $ \emptyset$ as a subset. So my guess is that the language is decidable.

Therefore, let us assume that $ L_1$ is decidable. Lets say that $ N$ is the TM which decides $ L_1$ .

N = "with input $ <M>$ :"

How can I proof that $ N$ is a decider for $ L_1$ ?

What’s wrong with this “proof” that $\mathbb{R}$ is enumerable?

The fake proof:

  • We know that $ \mathbb{R}$ is uncountable, hence we cannot enumerate over it.
  • But what we do know is that $ \mathbb{Q}$ , the set of rationals, is countable, and even denumerable.
  • We also know that we can construct $ \mathbb{R}$ through what are called Dedekind cuts.
  • We choose to let the partition itself denote a new number and go forth to define mathematical operations on it as to be compatible with the rest of the numbers (mainly $ \mathbb{Q}$ and our new number $ x$ )

Sidenote: I think so far this is standard, and contains nothing false. The actual argument starts below this line.

  • Let us denote the set containing $ x$ as $ S_1 := \mathbb{Q}\cup\{x\}$ . For convenience, the superscript of $ S_1$ is how many new such numbers we have added through the cuts.

  • Since $ \mathbb{Q}$ is countable, we can enumerate over every single rational $ q\in\mathbb{Q}$ to produce an $ r\in\mathbb{R}$ . Do this process $ n$ times and you end up with $ S_n = \mathbb{Q}\cup{x_1}\cup{x_2}\cup\dots\cup{x_n}$ .

  • But $ S_n$ is also enumerable since it has a finite more elements than $ \mathbb{Q}$ .

  • Hence – After enumerating over the entirety of $ \mathbb{Q}$ – Start enumerating over the entirety of $ S_n\setminus\mathbb{Q}$

  • Now we will end up with even newer numbers to put in our set, which we will now call $ S_{n,k}$ where $ n$ represents the enumeration over $ \mathbb{Q}$ and $ k$ represents the enumeration over $ S_n\setminus\mathbb{Q}$ . Do this ad infinitum and you will eventually describe $ \mathbb{R}$ .

I know I went wrong somewhere, I just don’t know where.

Proof for time complexity of Insertion (k-proximate) Sort equals O(nk)

The following is the definition for Proximate Sorting given in my paper:

An array of distinct integers is k-proximate if every integer of the array is at most k places away from its place in the array after being sorted, i.e., if the ith integer of the unsorted input array is the jth largest integer contained in the array, then |i − j| ≤ k. In this problem, we will show how to sort a k-proximate array faster than Θ(n log n).

The following is the proof for time complexity of k-proximate insertion sort:

To prove O(nk), we show that each of the n insertion sort rounds swap an item left by at most O(k).

In the original ordering, entries that are ≥ 2k slots apart must already be ordered correctly: indeed, if A[s] > A[t] but t − s ≥ 2k, there is no way to reverse the order of these two items while moving each at most k slots. This means that for each entry A[i] in the original order, fewer than 2k of the items A[0], . . . , A[i−1] are less than A[i]. Thus, on round i of insertion sort when A[i] is swapped into place, fewer than 2k swaps are required, so round i requires O(k) time.

My problem with the proof is this line : "This means that for each entry A[i] in the original order, fewer than 2k of the items A[0], . . . , A[i−1] are less than A[i]". The preceding statements just proved that t-s < 2k, otherwise the elements are sorted (as elements with distance greater than 2k cannot be swapped.) Isn’t the correct statement : "This means that for each entry A[i] in the original order, fewer than 2k of the items A[0], . . . , A[i−1] are greater than A[i]"?

Proof that truncation of a regular language is regular

Let $ L_1$ be a regular language and $ L_2$ any language. I want to prove that the language of $ L_1$ truncated by $ L_2$ is a regular language, i.e.

$ $ \{w| wx\in L_1 \text{ and } x\in L_2\}$ $

is regular.

To prove that something is regular it is enough to prove that it is described by a regular expression. We know that $ L_1$ is described by a regular expression, but regular expressions don’t handle "differences" in any obvious way.

We could imagine the DFA which accepts exactly the strings of $ L_1$ . But to truncate the strings of $ L_2$ we would need some way of recognizing the portions of the tail of the word which are in $ L_2$ and there may be no DFA which does this. Likewise for any recognizing DFA.

In general it’s hard to imagine having "any language" interact with a regular language to produce a regular language. The "any language" isn’t constrained by any of the tools that seem relevant to regular languages, like NFAs and regular expressions.