SQL – Sparse data in time series

[I wasn’t sure if this is the proper place to post SQL questions.] I have a public dataset of pharmaceutical prices. Any given drug gets a new price on some unpredictable day, and then the price remains that price until the next price change.

E.g.

drug            date          new_price acetaminophen   2020-01-09    0.25 oxycontin       2020-01-10    1.40 valaxirin       2020-02-10    2.34 oranicin        2020-02-11    1.54 acetaminophen   2020-02-12    1.47 

I have to do a variety of analytics e.g. "what was the price of acetaminophen on 2020-02-01?" Well that would of course be 0.25, but I need a way to figure that out in SQL. I have a variety of more complex queries, e.g. "list the ten cheapest drugs on a given date". So a solution I think needs to be generalized.

I realize that one possible solution would be to run a job that populates the database with prices for every day of the year, but I prefer not to solve the problem that way.

Fastest Algorithm for finding All Pairs Shortest Paths on Sparse Non-Negative Graph

As discussed here Johnson’s Algorithm can be used to solve the APSP-Problem in $ O(V^2\log V + VE)$ in stead of $ O(V^3)$ for Floyd-Warshall. However, Johnsons Algorithm does quite a bit of work for the case that some weights are negative. Is there a faster way to compute this given that all weights are positive?

Would a sparse NP-Complete language imply L = NP?

[1] Assume there exists a (sparse NP-Complete language).

Mahaney’s theorem is a theorem in computational complexity theory proven by Stephen Mahaney that states that if any sparse language is NP-Complete, then P=NP.

https://en.wikipedia.org/wiki/Mahaney%27s_theorem

[2] In other words, (sparse NP-Complete language) implies (P = NP).

[3] From [1] and [2] (P = NP).

[4] (P = NP) implies (P-complete = NP-complete).

[5] From [3] and [4] (P-complete = NP-complete).

[6] From [1] and [5] there exists a (sparse P-Complete language)

In 1999, Jin-Yi Cai and D. Sivakumar, building on work by Ogihara, showed that if there exists a sparse P-complete problem, then L = P.”

https://en.wikipedia.org/wiki/Sparse_language

[7] In other words, (sparse P-complete problem) implies (L = P).

[8] From [6] and [7] (L = P).

[9] From [3] and [8] (L = NP).

Arnoldi method misses eigenvalues degeneracies for very sparse matrices

I am here to signal a problem very very similar to the one already discussed here

Wrong eigenvalues from a sparse matrix

In particular, I have a very sparse matrix and I am asking just few dominant eigenvalues.

I noticed that, while the algorithm (arnoldi, with max iterations very large ~ 10^6) finds the correct eigenvalues, It misses the degeneracies (my matrix as a double degeneracy for all the eigenvalues). A funny feature is that the problem gets solved if instead of asking very few eigenvalues (say, 10) I ask some more (say, 40).

Is there any updates about this problem?

Matrices of Machine- or Arbitrary-Precision Real Numbers Error While Using Arnoldi for Large Sparse Matrix

I’m trying to use the built in Arnoldi method in Mathematica to compute the first 1440 eigenvalues of a large sparse matrix. After importing

Elements12 = NN /@ Uncompress[Import["Elements12.m"]] and applying the followings function

NN[{a_, b_} -> c_] := {a, b} -> N[c];

I define a sparse matrix

s = SparseArray[Elements12] 

which I’m able to nicely plot using MatrixPlot

MatrixPlot[s]. 

However when it comes to finding the first 1440 eigenvalues

Sort[Eigenvalues[s, -1000, Method -> "Arnoldi"]] 

I receive the error that “Method -> Arnoldi can only be used for matrices of machine- or \ arbitrary-precision real numbers” despite my matrix elements being machine presicion after applying NN. My matrix is 344640 by 344640 and has 5,300,000 non zero elements. Any help will be appreciated.

Fastest way to ArrayFlatten a 2D array of 2D arrays, both for Sparse and Dense

I am using Mathematica 12 and want to know what is the fastest way to transform a 2D array of 2D arrays both for dense and sparse arrays. I remember looking for this a few versions of Mathematica ago, when ArrayFlatten was not that good. Since then I have been using option one below (which I found somewhere here, but I can’t find the question anymore – it had other suggestions as well). But a quick check shows that for SparseArrays that is not the case anymore. Is there anything better than ArrayFlatten

Also, why is the SparseArray version of 3 not the same as the other two, unless I make them dense?

dim = 5; eles = 50; sparse = Table[    KroneckerProduct[RandomReal[{-10, 10}, {eles, eles}],      IdentityMatrix[eles, SparseArray]], {ii, 1, dim}, {jj, 1, dim}]; Dimensions[sparse]  {5, 5, 2500, 2500}  RepeatedTiming[  jSparse1 =     Apply[Join[##, 2] &,      Table[Join @@ sparse[[All, ii]], {ii, 1, dim}]];]  {0.04, Null}  RepeatedTiming[jSparse2 = ArrayFlatten[sparse];]  {0.0094, Null}  RepeatedTiming[  jSparse3 =     SparseArray`SparseBlockMatrix[{{i_, j_} :> sparse[[i, j]]}, {dim,       dim}];]  {0.23, Null}  Dimensions /@ {jSparse1, jSparse2, jSparse3}  {{12500, 12500}, {12500, 12500}, {12500, 12500}}  jSparse1 === jSparse2  True  jSparse1 === jSparse3  False  jSparse2 === jSparse3  False  Normal[jSparse1] === Normal[jSparse2] === Normal[jSparse3]  True 

And here is the dense version with same conclusion

dim = 5; eles = 50; dense =    Table[KroneckerProduct[RandomReal[{-10, 10}, {eles, eles}],      IdentityMatrix[eles]], {ii, 1, dim}, {jj, 1, dim}]; Dimensions[dense]  {5, 5, 2500, 2500}  RepeatedTiming[  jDense1 =     Apply[Join[##, 2] &,      Table[Join @@ dense[[All, ii]], {ii, 1, dim}]];]  {1.251, Null}  In[5]:= RepeatedTiming[jDense2 = ArrayFlatten[dense];]  {0.61, Null}  RepeatedTiming[  jDense3 =     Normal[SparseArray`SparseBlockMatrix[{{i_, j_} :>         dense[[i, j]]}, {dim, dim}]];]  {2.3, Null}  Dimensions /@ {jDense1, jDense2, jDense3}  {{12500, 12500}, {12500, 12500}, {12500, 12500}}  jDense1 === jDense2  True  jDense1 === jDense3  False  jDense2 === jDense3  False  Max[jDense1 - jDense2]  0.  Max[jDense1 - jDense3]  0.  Max[jDense2 - jDense3]  0. 

and again the third version produces a result that apparently is not identical with the other two, though numerically they seem to be the same.

How does Indexed work in terms of sparse array

I want to use indexed given that s is an element [0,500] but I am unsure how to write that without getting a format error or a tensor error.

\[CapitalDelta]t = .0001; t = .0833; \[Sigma] = .2183; \[CapitalDelta]s = 5; s = [0, 500]; \[Mu] = ((\[Sigma]^2 Indexed[s, i]^2)/     Indexed[\[CapitalDelta]s, i]^2*\[CapitalDelta]t); \[Alpha] = (Indexed[s, i]/(    2*Indexed[\[CapitalDelta]s, i]^2*\[CapitalDelta]t)); cn1[k2_, n_] =   SparseArray[{{m_, m_} ->      1/2 + 1/2*\[Mu] +       1/2*Indexed[rate, {k2, n}]*\[CapitalDelta]t, {m_, l_} /;       l - m == 1 -> -(1/4)*\[Mu] -       1/2*Indexed[rate, {k2, n}]*\[Alpha], {m_, l_} /;       m - l == 1 -> -(1/4)*\[Mu] +       1/2*Indexed[rate, {k2, n}]*\[Alpha]}, {101, 101}] 

Beating scipy’s sparse matrix multiplication

I develop an application that makes extensive use of sparse matrix operations.

Recently, I found out that scipy takes a lot of time checking matrix sizes and what not, up to the point that the checks take more time than the actual operations. See the posted issue with some metrics here.

So I decided to start the dubious enterprise of making my own sparse matrix library CSparse3.

At the moment, using numba, my library is only a 30% slower than scipy in a (+, *, /, -) test. And it is because of the matrix multiplication. Currently, I am using the algorithm from this book.

I am struggling to find other competitive CSC sparse matrix algorithms that are explained for humans. So I’d be grateful to know about better ones if available (with emphasis in the explained for humans part).

Calculating diagonal of inverse of sparse band-like matrix

I’m trying find an optimization for an equation related to theorem 3.5.7 from “Finite Markov Chains” by Kemeny and Snell (1976). The theorem is:

$ $ H=(N-I)N_{dg}^{-1}$ $

Where $ N_{dg}$ is a diagonal matrix with all 0 except for the diagonal portion of N and

$ $ N=(I-Q)^{-1}$ $

And $ Q$ is a $ n \times n$ square sparse pairwise transition matrix that looks like: enter image description here

The matrix is symmetric structurally (shown by the colors), but not numerically. It is characterized by 3 tri-bands, with the distance between the bands related to $ \sqrt{n}$ . If there is a technical term for this matrix, I don’t know what it is.

The modified version of the calculation I’m trying to solve simply involves pre-multiplying the theorem with a vector, like so:

$ $ v(N-I)N_{dg}^{-1}$ $

I’m trying to apply this to extremely large matrices (n=1,000,000 is the low end of what we consider useful). I can solve the $ v(N-I)$ part of the equation easily enough using the Ax=B approach. Inverting $ N_{dg}$ is simple (simply invert the individual values, since the non-diagonal values are all 0). That only leaves finding $ N_{dg}$ itself in a reasonably efficient manner (both in terms of time and memory requirements). I would also like to avoid estimation if possible.

Ideally, I wouldn’t have to calculate $ N$ , which involves an inverse that results in a dense matrix, and instead solve the linear system. Unfortunately, the weird nature of $ N_{dg}$ seems to preclude that option.

I can solve for individual rows and columns of $ N$ by including a vector multiplication and solving the system. From this, I can extract individual elements of $ N_{dg}$ . However, to do so iteratively to calculate the entirety of $ N_{dg}$ is impractically slow (months to years).

I did come across “The inverse of banded matrices” by Emrah Kılıç and Pantelimon Stanica (2013), and subsequently came across this Computer Science stackexchange post that also links to the same paper. Unfortunately, the answer provided assumes that the inverse will not be calculated directly, so my only option is to try and work through the paper myself. My hope is that because I don’t need to calculate the entire inverse it will be practical. It looks promising, particularly the $ i=j$ case in theorem 7, but it’s also beyond my level of linear algebra, so it could take a while for me understand and just end up being a time-consuming dead-end.

Does anyone have any suggestions for a computationally reasonable approach for calculating $ N_{dg}$ ? If the paper I found happens to be the best solution, just distilling it down would be helpful because I’m not a math major.