Devising an algorithm to remove “complete” sets of numbers from a list of numbers

Suppose I have a list of numbers consisting of numbers 1 through 5, where some numbers repeat. Define a set of numbers to be "complete" if it consists of each of the numbers 1-5 exactly once. Now, I want to pull out as many possible complete, sorted (lowest to highest) sets from this list. How can I do this (efficiently)?

Here is the solution I have in mind so far. Traverse through the list, and place each number into a pile of numbers that is the same number. For example, place all the 1s into a pile, all the 2s into a pile, and so on. Then, go through all of the piles and determine the minimum number of elements in a pile. For example, if the pile of 2s has five 2s and all other piles have greater than or equal to five elements, then 5 is the minimum number of elements. This minimum, call it min, will give us the number of complete sets in the list. Finally, go through the piles in order and pull out the elements one by one until you form a complete set, and again to form another complete set, and so on until min number of complete sets. Tada! You now have pulled out as many possible complete, sorted sets from this list.

Here is an example:

Let’s say the list is 3 2 1 2 2 3 4 5 5 4 4 3 4 1 1 2 5. We traverse the list and put them into piles:

333 2222 111 4444 555

The minimum number of elements in a pile is 3, so we have 3 complete sets. Now just go through the piles in order and pick up elements one by one:

12345 12345 12345

And tada! We have pulled out as many complete, sorted sets as possible.

Where I am lacking in my solution is figuring out how to efficiently go through the piles in order or make an efficient way of ordering the piles so that I can easily go through them in order.

I’d love to hear any feedback, especially other algorithms you might devise or ways to go about ordering. Thanks!

Spanning tree in a graph of intersecting sets

Consider $ n$ sets, $ X_i$ , each having $ n$ elements or fewer, drawn among a set of at most $ m \gt n$ elements. In other words $ $ \forall i \in [1 \ldots n],~|X_i| \le n~\wedge~\left|\bigcup_{i=1}^n X_i\right| \le m$ $

Consider the complete graph $ G$ formed by taking every $ X_i$ as a node, and weighing every edge $ (i,j)$ by the cardinal of the symmetric difference $ X_i \triangle X_j$ .

An immediate bound on the weight of the minimal spanning tree is $ \mathcal{O}(n^2)$ , since each edge is at most $ 2 n$ , but can we refine this to $ \mathcal{O}(m)$ ?

For illustration, consider $ 2 p$ sets, $ p$ of which contain the integers between $ 1$ and $ p$ and $ p$ of which contain the integers of between $ p+1$ and $ 2p$ . A minimal spanning tree has weight $ p$ but a poorly chose tree on this graph would have weight $ (p-1)p$ . Intuitively, if there are only $ m$ values to chose from, the sets can’t all be that different from one another.

Maximization problem on finite collection of finite sets


Problem

I am considering the following maximization problem:

  • Input is a finite collection of finite sets $ \mathcal{F} = \{ X_1, X_2, \ldots, X_n \}$ .
  • Goal is to find a subset $ G \subseteq \mathcal{F}$ that maximizes $ |G| \times |\bigcap G|$ where
    • $ |G|$ is the cardinality of the set $ G$ , and
    • $ \bigcap G = \bigcap \{X_{i_1}, X_{i_2}, \ldots, X_{i_m} \}$ .

As an example, for the collection $ $ \mathcal{F} = \{ \{a, b, c\}, \{a, b, c, x\}, \{b, c, y\}, \{a, b, c, z\} \}, $ $ the maximizing subset is $ G = \{ \{a, b, c\}, \{a, b, c, x\}, \{a, b, c, z\} \}$ and the score is $ 3 \times |\{a, b, c\}| = 9$ .

Note: the score of $ \mathcal{F}$ itself is $ 4 \times |\{b, c\}| = 8$ .

Question

I am planning to use a procedure of this problem for compressing data (represented by finite collections of finite sets). However, I don’t have any good idea to solve this problem efficiently. As yow know, we can solve this by enumerating all the collections of $ \mathcal{F}$ ; but, it’s too slow for practical use.

Is there a polynomial-time or some kind of efficient algorithm for this problem? Or, does this problem belong to the complexity class that cannot be solved in polynomial time?

Closure of disjoint sets

I am trying to solve the following problem but I cannot figure out two disjoint sets such that their closures are equal.
Find languages S and T over the alphabet {a, b} such that $ S \not\subset T $ and $ T \not\subset\ S $ (S is not contained in T and T is not contained in S) but S* = T*. It might be trivial, but I would appreciate some help. Thanks.

Clustering sets by set difference

Suppose you have $ n$ nonequal sets $ S_1, \ldots, S_n$ and some constant $ 0 \le k < n$ . The goal of set clustering is to find a partition of the set $ \mathbf{S} = \{S_1, \ldots, S_n\}$ such that the sum of the total distance for each subset of $ \mathbf{S}$ is minimized and such that $ \mathrm{cardinality}(\mathbf{S}) = k$ (in reality, this latter constraint is not quite so tight, but the size of $ \mathbf{S}$ must be less than $ k$ , and hopefully near it). The total distance $ d$ of a set $ X \in \mathbf{S}$ is $ d(X) = \sum_{A, B \in X} \mathrm{cardinality}(A \ominus B)$ where $ \ominus$ is symmetric difference. Assume for the purposes of the problem that the sets consume $ O(1)$ space (so symmetric difference can be computed in constant time).

Is there a good greedy linear(ithmic) heuristic for this problem? Is there any literature on this problem, or similar ones?


All I’ve come up with so far is an $ O(n^2 \log n)$ heuristic that looks like:

  1. Set $ I = \{1, \ldots, n\}$
  2. Choose some $ i \in I$
  3. Emit a cluster $ C \subset I$ containing all $ m \in I$ such that $ \mathrm{cardinality}(S_i \ominus S_m) \le d_\mathrm{max}$ .
  4. Set $ I = I – C$
  5. Go to step 2.

where this process occurs in each step of a binary search that finds the best value of $ d_\mathrm{max}$ for a given $ k$ .

I was thinking that if there were some way to sort the list of sets such that nearby sets in the list have small symmetric difference, then a linearithmic solution might be easy to write.

How to show that these two disjoint sets $A$ and $B$ exist

I came across this problem which asks to show the existence of two disjoint Turing-recognizable sets $ A$ and $ B$ such that no decidable set $ C$ can separate them…

In this case, a set $ C$ is said to separate $ A$ and $ B$ if $ A \subseteq C$ and $ B \subseteq \overline{C}$ … If only $ A$ is Turing-recognizable, then we could easily set $ A$ to be $ A_{TM}$ and $ B$ as $ \overline{A_{TM}}$ . However, in this case both $ A$ and $ B$ are Turing-recognizable …. I think that $ A$ and $ B$ should be constructed using diagonalization, but could not think of a way to do it … Any help ?

Intersection of line segments induced by point sets from fixed geometry

I am reading up on algorithms and at the moment looking at the below problem from Jeff Erickson’s book Algorithms.

Problem 14 snippet from Recursion chapter out of Algorithms book by Jeff Erickson

I solved (a) by seeing a relationship to the previous problem on computing the number of array inversions. However, I am struggling with problem (b) as I cannot see how to reduce the circle point arrangement to an arrangement of points and lines that would be an input to the problem posed in (a). Assuming I have something for (b), I also cannot see how one might resolve (c).

For part (b), clearly every point $ p = (x, y)$ satisfies $ x^2 + y^2 = 1$ but I do not see how I might be able to use this fact to do the reduction. The runtime I am shooting for of $ O(n \log^2 n)$ also seems to tell me the reduction is going to cost something non-trivial to do.

Can anyone have some further hints/insights that might help with part (b) and potentially even part (c)?