Non-deterministic performance of query on select, from 1s to 60s on table with 1 billion rows

I’m trying to investigate why the performance of this query is so non-deterministic. It can take anywhere from 1 seconds, to 60 seconds and above. The nature of the query is to select a "time window", and get all rows from within that time window.

Here’s the query in question, running on a table of approximately 1 billion rows:

SELECT CAST(extract(EPOCH from ts)*1000000 as bigint) as ts     , ticks     , quantity     , side FROM order_book WHERE ts >= TO_TIMESTAMP(1618882633073383/1000000.0)     AND ts < TO_TIMESTAMP(1618969033073383/1000000.0)     AND zx_prod_id = 0 ORDER BY ts ASC, del desc 

The values within TO_TIMESTAMP will keep sliding forward as I walk the whole table. Here is the EXPLAIN ANALYZE output for the same query on two different time windows:

Slow Performance

Gather Merge  (cost=105996.20..177498.48 rows=586308 width=18) (actual time=45196.559..45280.769 rows=539265 loops=1)   Workers Planned: 6   Workers Launched: 6   Buffers: shared hit=116386 read=42298   ->  Sort  (cost=104996.11..105240.40 rows=97718 width=18) (actual time=45169.717..45176.775 rows=77038 loops=7)         Sort Key: (((date_part('epoch'::text, _hyper_16_214_chunk.ts) * '1000000'::double precision))::bigint), _hyper_16_214_chunk.del DESC         Sort Method: quicksort  Memory: 9327kB         Worker 0:  Sort Method: quicksort  Memory: 8967kB         Worker 1:  Sort Method: quicksort  Memory: 9121kB         Worker 2:  Sort Method: quicksort  Memory: 9098kB         Worker 3:  Sort Method: quicksort  Memory: 9075kB         Worker 4:  Sort Method: quicksort  Memory: 9019kB         Worker 5:  Sort Method: quicksort  Memory: 9031kB         Buffers: shared hit=116386 read=42298         ->  Result  (cost=0.57..96897.07 rows=97718 width=18) (actual time=7.475..45131.932 rows=77038 loops=7)               Buffers: shared hit=116296 read=42298               ->  Parallel Index Scan using _hyper_16_214_chunk_order_book_ts_idx on _hyper_16_214_chunk  (cost=0.57..95187.01 rows=97718 width=18) (actual time=7.455..45101.670 rows=77038 loops=7)                     Index Cond: ((ts >= '2021-04-22 01:34:31.357179+00'::timestamp with time zone) AND (ts < '2021-04-22 02:34:31.357179+00'::timestamp with time zone))                     Filter: (zx_prod_id = 0)                     Rows Removed by Filter: 465513                     Buffers: shared hit=116296 read=42298 Planning Time: 1.107 ms JIT:   Functions: 49   Options: Inlining false, Optimization false, Expressions true, Deforming true   Timing: Generation 9.273 ms, Inlining 0.000 ms, Optimization 2.008 ms, Emission 36.235 ms, Total 47.517 ms Execution Time: 45335.178 ms 

Fast Performance

Gather Merge  (cost=105095.94..170457.62 rows=535956 width=18) (actual time=172.723..240.628 rows=546367 loops=1)   Workers Planned: 6   Workers Launched: 6   Buffers: shared hit=158212   ->  Sort  (cost=104095.84..104319.16 rows=89326 width=18) (actual time=146.702..152.849 rows=78052 loops=7)         Sort Key: (((date_part('epoch'::text, _hyper_16_214_chunk.ts) * '1000000'::double precision))::bigint), _hyper_16_214_chunk.del DESC         Sort Method: quicksort  Memory: 11366kB         Worker 0:  Sort Method: quicksort  Memory: 8664kB         Worker 1:  Sort Method: quicksort  Memory: 8986kB         Worker 2:  Sort Method: quicksort  Memory: 9116kB         Worker 3:  Sort Method: quicksort  Memory: 8858kB         Worker 4:  Sort Method: quicksort  Memory: 9057kB         Worker 5:  Sort Method: quicksort  Memory: 6611kB         Buffers: shared hit=158212         ->  Result  (cost=0.57..96750.21 rows=89326 width=18) (actual time=6.145..127.591 rows=78052 loops=7)               Buffers: shared hit=158122               ->  Parallel Index Scan using _hyper_16_214_chunk_order_book_ts_idx on _hyper_16_214_chunk  (cost=0.57..95187.01 rows=89326 width=18) (actual time=6.124..114.023 rows=78052 loops=7)                     Index Cond: ((ts >= '2021-04-22 01:34:31.357179+00'::timestamp with time zone) AND (ts < '2021-04-22 02:34:31.357179+00'::timestamp with time zone))                     Filter: (zx_prod_id = 4)                     Rows Removed by Filter: 464498                     Buffers: shared hit=158122 Planning Time: 0.419 ms JIT:   Functions: 49   Options: Inlining false, Optimization false, Expressions true, Deforming true   Timing: Generation 10.405 ms, Inlining 0.000 ms, Optimization 2.185 ms, Emission 39.188 ms, Total 51.778 ms Execution Time: 274.413 ms 

I interpreted this output as most of the blame lying on this parallel index scan.

At first, I tried to raise work_mem to 1 GB and shared_buffers to 24 GB, thinking that maybe it couldn’t fit all the stuff it needed in RAM, but that didn’t seem to help.

Next, I tried creating an index on (zx_prod_id, ts), thinking that the filter on the parallel index scan might be taking a while, but that didn’t seem to do anything either.

I’m no database expert, and so I’ve kind of exhausted the limits of my knowledge.

Thanks in advance for any suggestions!

I can verify solutions to my problem in polynomial time, how would a non-deterministic algorithm arrive to a solution if it always takes $2^n$ bits?

Decision Problem: Given integers as inputs for $ K$ and $ M$ . Is the sum of $ 2^k$ + $ M$ a $ prime$ ?

Verifier

m = int(input('Enter integer for M: ')) sum_of_2**K+M=int(input('enter the sum of 2^k+m: '))  if AKS.sum_of_2**K+M == True:    # Powers of 2 can be verified in O(N) time   # make sure there is all 0-bits after the 1st ONE-bit      # Use difference to verify problem    if sum_of_2**K+M - (M) is a power_of_2:     OUTPUT Solution Verified 

The powers of 2 have approximately $ 2^n$ digits. Consider $ 2^k$ where $ K$ = 100000. Compare the amount of digits in $ K$ to the amount of digits in it’s solution! Also take note that the powers of 2 have $ 2^n$ bits as its 0-bit Unary essentially for the exponent $ n$ .

Question

How would a non-deterministic machine solve this problem in polynomial time?

Nondeterministic polynomial time algorithm versus certificate/verifier for showing membership in NP

In this paper (https://arxiv.org/pdf/1706.06708.pdf) the authors prove that optimally solving the $ n\times n\times n$ Rubik’s Cube is an NP-complete problem. In the process, they must show that the relevant decision problem belongs in NP (section 2.5 on page 6). To do this, they describe an algorithm that nondeterministically solves the Cube in polynomial time. It seems to me that this is more effort than necessary.

In particular, the relevant decision problem is as follows: The Rubik’s Cube problem has as input a permutation $ t$ and a value $ k$ . The goal is to decide whether $ t$ can be solved in $ k$ or fewer moves. So rather than constructing a nondeterministic polynomial time solution algorithm, they could simply give a certificate that a "yes" decision is just a list of at most $ k$ moves and verify that checking this is polynomial time.

So my questions are as follows. Are the two definitions below actually equivalent?

  1. NP is the complexity class of decision problems that are solvable by a nondeterministic Turing machine in polynomial time.
  2. NP is the complexity class of decision problems for which a solution can be confirmed in polynomial time (deterministically)?

And if they are equivalent, why would the authors of the linked paper choose the more difficult method (or am I wrong about this assumption)?


Note that I am posting this question on multiple StackExchange websites as I’m not sure where it’s best fit. If it is inappropriate here, I’ll happily delete it. Similarly, I’ll link to a good solution on another site if it gets answered there.

When can a Non-Deterministic Finite Automaton with Epsilon transitions considered to be in an accepted state?

A non-deterministic finite automaton is considered to be halted when either the whole input string has been consumed or when we reach a state where no available transition (if any) matches the current character being read.

If the machine halts when it’s in an accepted state and at the same time the whole input has been consumed the input string is considered to be accepted.

Now, when introduce $ \epsilon$ transitions the machine doesn’t necessarily halt when the whole input string has been consumed, for it is possible that there are still $ \epsilon$ transitions available.

Suppose we have a NFA that is in an accepted state and also that the whole input has been consumed, but there are still $ \epsilon$ transitions available in this state, can we considered the input string to be accepted or do we need to "follow the trail" of $ \epsilon$ transitions until we reach a state where no other transition is available?

Problem with proving that $RP \subseteq NP$ : a non-deterministic TM for a language $L \in RP$

I’m having a small issue with wikipedia’s proof that $ RP \subseteq NP$ :

An alternative characterization of RP that is sometimes easier to use is the set of problems recognizable by nondeterministic Turing machines where the machine accepts if and only if at least some constant fraction of the computation paths, independent of the input size, accept. NP on the other hand, needs only one accepting path, which could constitute an exponentially small fraction of the paths. This characterization makes the fact that RP is a subset of NP obvious.

My problem lies in the following statement: "…accepts if and only if at least some constant fraction of the computation paths, independent of the input size, accept"

My question is: for a language $ L \in NP$ , wouldn’t it only need one computation path for the non-deterministic TM to accept? Since, if $ x \in L$ , $ P[T(x) accepts] >= 1/2$ with $ T$ being the Turing machine that decides $ L$ .

Is this push-down automaton non-deterministic, as JFLAP states?

There is a tool called JFLAP, which, among other things, can analyze push-down automata, and find non-determinism.

In this example it is detecting non-determinism in state q0: Non-deterministic push-down automaton

The first symbol in the transition represents the symbol read as input; the second symbol represents the symbol extracted from the stack; and the third symbol is the symbol pushed to the stack. λ represents the empty symbol, so this is an empty transition without checking the stack or pushing anything to it.

I am surprised, as that state seems to fulfill the conditions for determinism for push-down automatons (if only because it only contains a single transition!). I would expect the next state to be q1 under any circumstance.

In comparison, JFLAP doesn’t find any non-determinism here: Deterministic push-down automaton

Mind you, the transition is the same, it only changes that this one adds something to the stack. Am I missing something or is JFLAP wrong in the first instance?

Can a non-deterministic machine merge its branches?

Does an NDTM have the power to combine computational branches ie. can a result from branch A be used in the next step in the computation along branch B? Can branches use each others’ results, diagrammatically ‘merging’?

Example:

Branch i arrives at the number b after n steps, branch j arrives at the number c after 2n steps. After both have arrived at their respective values (we have waited 2n steps) I want the computer to multiply 3*5 (the results from the different branches). Can I do this?

When our two-state PDA constructed from CFG is non-deterministic PDA?

We can always convert our GNF-CFG/CNF-CFG to a two-state PDA but i’m wondering when our PDA is non-deterministic? i’m sure we can not make DPDA for non-Deterministic-CFL , and i suspect that same rule which applied for differing between DCFG and non-Deterministic-CFG also applied here. i mean when we have non-Deterministic δ (delta) implied non-Deterministic edge in our PDA. if my suspicion is right , then for every DCFL exist at least one DPDA. Am i right?

R ≝ Production Rules of CFG (x,y,"LBL") is a labeled-edge between x and y with “LBL” as a label  ∀r∊R: r= (A,aⱰ) ( A∊V ⋀ a∊T ∧ Ɒ∊V*) add (q,q,"a,A/Ɒ") to E Add (q,q,"ε,z/Sz′") to E Add (q,f,"ε,z′/z′") to E 

enter image description here

If anything can be verified efficiently, must it be solvable efficiently on a Non-Deterministic machine?

Suppose, I wanted to verify the solution to $ 2$ ^$ 3$ . Which is $ 8$ .

The $ powers~of~2$ have only one 1-bit at the start of the binary-string.

Verify Solution Efficently

n = 8 N = 3 IF only ONE 1-bit at start of binary-string:   IF total_0-bits == N:    if n is a power_of_2:      OUTPUT solution verified, 2^3 == 8 

A solution will always be approximately $ 2$ ^$ N$ digits. Its not possible for even a non-deterministic machine to arrive to a solution with $ 2$ ^$ N$ digits faster than $ 2$ ^$ N$ time.

Question

Can this problem be solved efficently in non-deterministic poly-time? Why not if the solutions can be verified efficently?

Does this imply Hamiltonian path cannot be decided in nondeterministic logspace?

Suppose I nondeterministically walk around in a graph with n vertices.

When looking for a Hamiltonian path, at some point I’ve walked n/2 vertices.

There are (n choose n/2) different combinations of vertices I could have walked (meaning the unordered set, not the ordered walk).

Each of those states must be distinct from one another.

If not, then, depending on the remaining n/2 vertices, I would decide the wrong answer.

Therefore, midway through my processing, at n/2, I need (n choose n/2) different states. That is too big for logspace.

Therefore you cannot decide a Hamiltonian path by nondeterministically walking around.

Does this imply Hamiltonian path cannot be decided in nondeterministic logspace – at least by “walking around”?