I want a data structure which describes a universe of tuples all of length $ n$ , $ U$ . In $ U$ we associate each $ tuple$ with a non-negative value, which we will denote as $ U.count(tuple)$ . If $ U.count(tuple)=0$ , then we consider $ tuple$ to not be in $ U$ . I want to be efficiently update these $ U$ with five operations, some of which involve two fixed constants $ maxVal$ and $ minVal$ . (to clarify, $ maxVal$ and $ minVal$ should not change throughout the process, and should a tuple in $ U$ should never have a value which exceeds $ maxVal$ )
append(U,c): yields an updated universe $ U’$ , $ c$ should be below $ maxVal$ . For each tuple $ t$ in $ U$ , make a copy $ t’$ , and append $ c$ to $ t’$ . For each $ t’$ , $ U’.count(t’) = U.count(t)$ .
add(U,c,i): yields an updated universe $ U’$ . For each tuple $ t$ in $ U$ , we create a copy $ t’$ , and add $ c$ to $ t'[i]$ . If $ t'[i]<maxVal$ , then then $ t’$ is in $ U’$ and $ U’.count(t’) = U.count(t)$ . Otherwise $ U’.count(t’)=0$ and for our purposes we no longer need to keep track of it, and can exclude it from $ U’$ .
merge(U,i,j): yields an updated universe $ U’$ . For each tuple $ t$ in $ U$ , we create a copy $ t’$ , we pop $ t'[j]$ , and add its value to $ t'[i]$ . In this case, there can be multiple tuples $ t$ which correspond to $ t’$ in $ U’$ , thus let us initially say $ U’.count(t’)=0$ and then add to this. Now, if $ t'[i] < maxVal$ , $ U’.count(t’) = U’.count(t’) + U.count(t)$ .
finish(U,i): yields an updated universe $ U’$ . For each tuple $ t$ in $ U$ , we make a copy $ t’$ and delete $ t'[i]$ . Again, for the same reasons as in 2, let us initially say $ U’.count(t’) = 0$ . If $ t[i] > minVal$ , then $ t’.count = t’.count + t.count$ . At the end, all $ t’$ with non-zero count are included in $ U’$ .
combine(U1,U2): yields an updated universe $ U’$ , $ U_1$ and $ U_2$ should have same length tuples. For each $ t$ in $ U_1 \cup U_2$ , $ U’.count(t) = U_1.count(t) + U_2.count(t)$
My thoughts so far (messy)
If we did this naively, and represented $ U$ as a list of the tuples with non-zero, this would be horribly inefficient. Let’s assume $ U.count(t) \neq 0$ for all $ t$ that don’t exceed $ maxVal$ . Then, with $ maxVal = 10$ , a universe for tuples of length $ 4$ , that’s 10,000 if statements for the
I think one optimization would to have a tree, where the first level has nodes for each value of $ t$ , and the child of a node are the next element in the tuple. I’d imagine you’d do this by having a list of all your first level nodes. Each node itself, is represented as a tuple of its value, and a list of its valid children, or $ U.count(t)$ for leaves. With
add at lower levels, you have much less updates to do, and by ordering children in a fixed way according to their value, and keeping childless nodes as placeholders, your only need to do the validity check once too. Furthermore, to make
append efficient, instead of having the levels be consecutive indices, we could make the order arbitrary, describe the tree-level to tuple index conversion with another list. For stuff like
combine I am less sure what’s a good way to do things, maybe it would be beneficial to actually sum the counts of shared tuples, but maybe there are a clever cheats or work arounds…
My explicit questions
1a) Is there an efficient data-structure which performs these operations?
1b) Otherwise, are there some general rules of thumb for data structure which are relevant for this problem or my proposed solution?
2) Same as above but for the case where
combine is applied to a very large number of universes, instead of just 2.