How to find efficiently all positive linear dependencies between some vectors

I’ve got these vectors

vecs= {{0,1,0,0,0,0,0,-1,0},    {1,-1,1,0,0,0,-1,1,-1},  {1,0,-1,1,0,-1,1,0,-1},  {1,0,-1,1,0,0,-1,0,1},   {1,0,0,-1,0,1,0,0,-1},   {1,0,0,-1,1,-1,1,-1,1},  {1,0,0,0,-1,0,0,1,0},    {-1,0,1,0,0,-1,1,0,-1},  {-1,0,1,0,0,0,-1,0,1},  {-1,1,-1,1,-1,1,0,0,-1}, {-1,1,-1,1,0,-1,1,-1,1}, {-1,1,0,-1,0,1,0,-1,1},   {-1,1,0,-1,1,-1,0,1,0},  {0,-1,0,0,1,0,0,0,-1},   {0,-1,0,1,-1,1,0,-1,1},  {0,-1,0,1,0,-1,0,1,0},   {0,-1,1,-1,0,1,-1,1,0},  {0,0,-1,0,0,0,1,0,0}} 

And I would like to find all linear dependencies with positive coefficients between them. I started with

ns = NullSpace[Transpose[vecs]]  

which gave me

{{2,2,-1,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,3},  {2,-1,2,0,-1,0,0,0,0,0,0,0,0,0,0,0,3,0},   {2,-1,-1,0,2,0,0,0,0,0,0,0,0,0,0,3,0,0},  {1,1,1,0,1,0,0,0,0,0,0,0,3,0,3,0,0,0},   {2,-1,-1,0,-1,0,3,0,0,0,0,0,0,3,0,0,0,0}, {-1,2,2,0,-1,0,0,0,0,0,0,3,0,0,0,0,0,0},   {-1,2,-1,0,2,0,0,0,0,0,3,0,0,0,0,0,0,0},  {-1,2,-1,0,-1,3,0,0,0,3,0,0,0,0,0,0,0,0},   {-1,-1,2,0,2,0,0,0,3,0,0,0,0,0,0,0,0,0},  {-1,-1,-1,3,2,0,0,3,0,0,0,0,0,0,0,0,0,0}} 

so there is one linear dependence with nonnegative coefficients (the fourth one). To check whether there are others, I made a system of inequalities with

ineqs = Simplify[Union[Map[# >= 0 &, Table[x[k], {k, Length[ns]}].ns]]] 

which returns

{x[1]>=0,x[2]>=0,x[3]>=0,x[4]>=0,x[5]>=0,x[6]>=0,x[7]>=0,x[8]>=0,x[9]>=0,x[10]>=0,  2 x[1]+2 x[2]+2 x[3]+x[4]+2 x[5] >= x[6]+x[7]+x[8]+x[9]+x[10],  2 x[1]+x[4]+2 (x[6]+x[7]+x[8])   >= x[2]+x[3]+x[5]+x[9]+x[10],  2 x[2]+x[4]+2 (x[6]+x[9])        >= x[1]+x[3]+x[5]+x[7]+x[8]+x[10],  2 x[3]+x[4]+2 (x[7]+x[9]+x[10])  >= x[1]+x[2]+x[5]+x[6]+x[8]} 

but my notebook runs out of memory on both Solve[ineqs] and Reduce[ineqs].

What is the proper way?

Why can a certificate of an arbitrary YES instance of SAT be found efficiently if SAT belongs to P?

I’m studying P vs NP and my professor said that the certificate of an arbitrary yes instance of SAT can be found efficiently if SAT belongs to P. I’m not really sure how this is true and would like to know why it is.

How can I efficiently construct a CFG from a language

I am new to CFG’s and automata in general and I came across an exercise where I needed to construct a CFG for the language {a^m b^n | n <= m + 3}.

So m can be infinitely bigger than n but n can only be up to 3 more bigger than m and they can be the same. I have no idea how to make a CFG for this.

What I came up with was:

S -> AB | _  A -> a | aa | aaa | C | _  C -> aC | a | _ B -> bB | b | _ 

But I think this is not even close…

Any help/tips/advice would be much appreciated!

How to parallelize a summation efficiently

Say I have an array a[1..n], and I want to output an array s[1..n] with s[i] = a[1]+…+a[i]. What is the best (or at least standard) way to do so in parallel?

The way I can think of doing it, given m processors, is to split the array a into m blocks of equal size M, compute b[j] = a[floor((j-1)/M)*M+1]+…+a[j-1]+a[j] (in parallel), then compute s[iM] for 1<=i<=m by summing values of b[jM] (serially), and then, lastly, compute s[jM+i] = s[jM]+b[jM+i] for all 0<i<M, 0<=j<m (in parallel). That feels a little wasteful, though the time complexity is right – is there a better way?

How should be GSA SER projects efficiently arranged to create more links?

Hello Sven,
is it better to create one master project and duplicate it aside (lets says 5 times), or create master project (already having approx. 19900 verified links for a few months) and create 5 Tiers below it? I have recently purchased here on Your board annual DropBox subscription with links refreshed every now and then, but DropBpox fights with GSA SER for system resources. Therefore I perform projects to reset some email cache, unify articles to keep only 1 finespinned and so one. My VpM falls no matter how I did it to @ 0,23VpM in a while.
You helped me some tine ago resolve freezing of GSA SER because of houndreds of e-mail addresses, that were kept in RAM. So now, I use for master project just 1000 of them, and they are shared to its Tiers. Now, highest file size in the folder is master’s file (.statics @ 25MB).

Any hint would be appreciated from You.
Thanks,
Michal

Efficiently removing elements from list of permutations

I’m looking for an efficient way to remove entries from a list of permutations.

I have a list of variables from which I calculate all possible permutations. I then want to remove those permutations that begin with a sequence of variables that match one from another list. The order of the variables is important.

As an example of the desired behaviour, I begin with the list $$(a,b,b,c,c,c)$$ and then compute all permutations, giving $$((a,b,b,c,c,c),(b,a,b,c,c,c),(c,a,b,b,c,c),\ldots)$$, and so on. I have a second list of the form $$((a), (b, c), (c, b, b))$$. I want to remove from the list of permutations those of the form $$(a,\ldots)$$, $$(b,c,\ldots)$$ or $$(c,b,b,\ldots)$$.

At the moment, I’m using DeleteCases to achieve this. For the above example:

(* original list of permutations *) original = Permutations[{a, b, b, c, c, c}];  (* list of permutations to be removed *) remove = {{a}, {b, c}, {c, b, b}};  (* convert to pattern *) remove = Join[#, {___}] & /@ remove;  (* remove all permutations from original that start with one of the sequences in "remove" *) reduced = DeleteCases[original, Alternatives @@ remove]; 

This seems fine for small numbers of permutations, but rapidly gets out of hand. The following code can be used to generate lists of arbitrary length permutations and the sequences to be removed.

(* function to repeat variable in list *) repeat[m_, 0] := Nothing repeat[m_, n_Integer?Positive] := Sequence @@ ConstantArray[m, n]  (* how many times do a, b, c repeat in original permutations? *) n = 4; (* which starting sequences are we removing? *) m = 2;  (* original list of permutations *) original = Permutations[{repeat[a, n], repeat[b, n], repeat[c, n]}];  (* example list of permutations to be removed - not all of the same length in general *) remove = Join[    Permutations[{repeat[a, m], repeat[b, m], repeat[c, m]}],     Permutations[{repeat[a, m], repeat[b, m], repeat[c, m + 1]}]];  (* convert to pattern *) remove = Join[#, {___}] & /@ remove;  (*remove all permutations from original that start with one of the sequences in "remove"*) reduced = DeleteCases[original, Alternatives @@ remove]; 

For $$n=4$$ and $$m=2$$, this runs in ~0.5s. For $$n=5$$ and $$m=3$$, this balloons to ~200s.

In my real code, original is generated as above, from Permutations. The remove list is not generated from a list of permutations like in the above code – it will contain elements of length 1 to one less than the length of the elements of original.

Is there any way to speed up the removal of the matching permutations? Or is it hopeless, given how the number of permutations blows up?

Thanks!

Efficiently storing and modifying a reorderable data structure in a database

I’m trying to create a list curation web app. One thing that’s important to me is being able to drag-and-drop reorder items in the list easily. At first I thought I could just store the order of each item, but with the reordering requirements, that would mean renumbering everything with a higher order (down the list) than the place where you removed or where you inserted the moved item. So I started looking at data structures that were friendly to reordering, or to both deletion and insertion. I’m looking at binary trees, probably red-black trees or something like that. I feel like I could with great effort probably implement the algorithms for manipulating those.

So here’s my actual question. All the tree tutorials assume you’re creating these trees in memory, usually through instantiating a simple Node class or whatever. But I’m going to be using a database to persist these structures, right? So one issue is how do I represent this data, which is kind of a broad question, sure. I would like to not have to read the whole tree from the database to update the order of one item in the list. But then again if I have to modify a lot of nodes that are stored as separate documents, that’s going to be expensive too, even if I have an index on the relevant fields, right? Like, every time I need to manipulate a node I need to actually find it in the database first, I think.

Also, I obviously need the content and order of each node on the front end in order to display the list, but do I need the client to know the whole tree structure in order to be able to send updates to the server, or can I just send the item id and its new position in the list or something?

I’m sorry if this veers too practical but I thought someone might want to take a crack at it.

If I can efficiently uniformly sample both $A$ and $B\subset A$, can I efficiently uniformly sample $A-B$?

As posed in the question; the statement naively seems like it should be self-evident but there are no algorithms that come immediately to mind. Suppose I have some domain $$A$$ (in my case a subset of $$\mathbb{Z}^n$$ for some $$n$$, but I would expect any answer to be independent of that structure), and a domain $$B\subset A$$. If I have an efficient algorithm for uniformly sampling from $$A$$, and an efficient algorithm for uniformly sampling from $$B$$, is there any way of ‘combining’ these to get an efficient algorithm for uniformly sampling $$A-B$$? I can certainly rejection-sample, but if $$|A-B|\ll|A|$$ then there’s no guarantee that that will be efficient.

Distance to $k$th nearest neighbor efficiently for arbitrary $k$

Problem. Given $$X$$ a finite metric space of cardinality $$n$$, construct a data structure in subquadratic time such that the query distance to the kth nearest neighbor of x can be resolved in sublinear time (independent of $$k$$) for arbitrary $$k \leq n$$ and $$x \in X$$.

What I have tried. I am aware of the existence of kdtrees, ball-trees, and cover trees. Under some assumptions (which I’m willing to make), I know that these structures can resolve the query distance to the nearest neighbor of x in sublinear time (often $$O(\log(n))$$), but I haven’t been able to find similar results for the $$k$$th nearest neighbor for arbitrary $$k$$.

It seems that, often, one is interested in $$k$$ values that are small compared to $$n$$, and that, in those cases, the algorithms mentioned in the previous paragraph can be adapted at the cost of a multiplicative constant of the order of $$k$$. My problem is that I am interested in $$k$$ values that are potentially of the order of $$n$$.

Are there any enumerations of (machines for) languages in P such that all the machines can be simulated efficiently on any input?

From Computational Complexity A Modern Approach: Efficient Universal Turing Machine Theorem: There exists a TM U such that for every x, α ∈ {0, 1}∗, U(x, α) = Mα(x), where Mα denotes the TM represented by α. Furthermore, if Mα halts on input x within T steps then U(x,α) halts within CT log T steps, where C is a number independent of |x| and depending only on Mα’s alphabet size, number of tapes, and number of states.

From Kozen INDEXINGS OF SUBRECURSIVE CLASSES: "the class of polynomial time computable functions is often indexed by Turing machines with polynomial time counters…. The collection of all (encodings over (0, 1)) of such machines provides an indexing of PTIME…we have Theorem: No universal simulator for this indexing can run in polynomial space."

He then goes on to say: "can it be proved that gr U for any indexing of PTIME requires more than polynomial space to compute? We have proved this for a wide class of indexings, namely counter indexings satisfying the succinct composition property."

• gr U is the graph of the universal function U and (barring details) represents the minimum power necessary to simulate P uniformly.

• And the counter indexing (or polynomial time counters) he is referring to is specified in the answer here: How does an enumerator for machines for languages work?

I’m wondering how theorem for efficient universal Turing machine relates to Kozen’s result, that for certain types of enumerations of P, there is no machine that can efficiently simulate the machines enumerated. What causes simulation to be difficult and can it be circumvented–namely: does there exist an enumeration of P that describes the (machines for) languages in P in such a way that allows them to be efficiently simulated (with no more than poly overhead) on any input–or as Kozen puts it "allow(s) easy construction of programs from specifications"?

My guess is that part of the reason for the difficulty is because the efficient simulation theorem only says that there exists a machine that can efficiently simulate any single TM, but not a whole class of them… and when you start having to simulate more than one TM you loose the ability to optimize your simulator’s design to solve any particular language (run any particular machine) and have to design the simulator with the various machines you need to simulate in mind (and the more different those machines are the larger your overhead gets).

PS. A possible example could be in implicit complexity theory… where they construct languages that are complete for certain complexity classes. A language that is complete for P doesn’t seem to have trouble running its programs (which represent the languages in P)? But, if there are examples here, how do they overcome the difficulty Kozen is referring to, as they are programming systems and thus enumerations / indexings?

Just as a related aside… I think I have a proof that for a language Lp 1. and 2. cannot be true at the same time:

1. An enumeration of P, call it language Lp (whose members are the strings in the enumeration) is decidable in poly time.

2. All the P machines / languages represented by strings in Lp can be efficiently simulated by a universal simulator for P on any input.

It makes sense that there would be a relationship between the way the machines are encoded and the overhead for their simulation and 1. can be made to be true so that leaves 2. and brings us to the question being asked… Is it possible that 2. is always false–meaning for ANY enumeration/ encoding of P (any language Lp) simulation of those machines is not efficient for any universal simulator for P.

Here’s a rough sketch for anyone interested:

Take L:= {w∈L iff w∈Lp and W(w)=0}

So, one way to do this is our diagonal function maps w–>the language in P that w encodes (if w encodes a machine for a language in P (if w is a member of Lp)) and if it does not then it maps to the language containing all words. The existance of a map between a w and a language translates to w∈L iff w ‘is not a member’ of the language it is mapped to.

Since all w’s that aren’t encodings of P machines (members of Lp) are mapped to the language containing all words–they are members of L iff they are not members of this language. This is always false, so all words that are not members of Lp are not members of L.

This leaves only words of Lp as candidates for members of L. But for w to be a member of L not only does it need to be a member of Lp–the machine that w encodes, W, needs to evaluate to 0 on w. W(w)= 0 and w∈Lp <–> w∈L.

L is not in P. If L were in P then for some w, w would encode a machine for L, Wl. and for that w∈L iff Wl(w) = 0, ie. w∈L iff w is not in L.

Now, let’s employ the assumption that 1. Lp is poly decidable. As well as the assumption 2. that any machine specified by Lp can be simulated with no more than poly overhead by a universal simulator.

Then we can devise an algorithm, namely: given w, decide w∈Lp. If w∈Lp then run W(w). If w(w)=0 –> w∈L.

By the above algorithm, under these the assumptions 1. and 2. L would be in P–which is a contradiction to the previous proof by contradiction. I think the previous proof is correct and conclude that neither 1. nor 2. can be true at the same time.