Consider $ n$ sets, $ X_i$ , each having $ n$ elements or fewer, drawn among a set of at most $ m \gt n$ elements. In other words $ $ \forall i \in [1 \ldots n],~|X_i| \le n~\wedge~\left|\bigcup_{i=1}^n X_i\right| \le m$ $

Consider the complete graph $ G$ formed by taking every $ X_i$ as a node, and weighing every edge $ (i,j)$ by the cardinal of the symmetric difference $ X_i \triangle X_j$ .

An immediate bound on the weight of the minimal spanning tree is $ \mathcal{O}(n^2)$ , since each edge is at most $ 2 n$ , but can we refine this to $ \mathcal{O}(m)$ ?

For illustration, consider $ 2 p$ sets, $ p$ of which contain the integers between $ 1$ and $ p$ and $ p$ of which contain the integers of between $ p+1$ and $ 2p$ . A minimal spanning tree has weight $ p$ but a poorly chose tree on this graph would have weight $ (p-1)p$ . Intuitively, if there are only $ m$ values to chose from, the sets can’t all be that different from one another.