Suppose we have `N`

elements, which we’ll treat as a simple object. Is there a data structure I can use that will allow me to see which node appears earlier based on some arbitrary insertion order given a reference to both nodes? I’ll need some kind of data structure $ \mathcal{D}$ , where $ \mathcal{D}$ supports the following operations (assuming $ A$ and $ B$ are node references, and $ A \neq B$ ):

$ \mathcal{D}.addBefore(A, B)$

$ \mathcal{D}.addAfter(A, B)$

$ \mathcal{D}.addLast(A)$

$ \mathcal{D}.earlier(A, B) \rightarrow bool$

$ \mathcal{D}.remove(A)$

I’d like to implement some kind of $ Earlier$ predicate which takes two nodes and returns whether A comes before B. For example, if this was an indexed list and we had the index of the nodes, then it’d be simply:

$ $ Earlier(A, B) \implies A.index < B.index$ $

The ordering is determined by a user who inserts them as they see fit. They are allowed to add either after some node, or before some node, or if the data structure is empty then they can just add it and the element that was added first becomes the only element in the data structure until another element is added.

A practical example of this problem is that a user is pasting files into a directory, but the file explorer lets the user paste files before or after any file in the list. The file explorer must display the files in order that the user requests, so if a list is used to hold the files, then `[A, B, C]`

should render as `[A, B, C]`

, and if the user pastes a file `D`

before B, then the list should render `[A, D, B, C]`

.

This becomes a problem when I need to insert before another item: I don’t have that luxury since inserting into the middle of a list backed by an array has a big overhead. My next thought was to go with a linked list because I will have references to the two nodes and can quickly insert with my handle to the node. This is $ \mathcal{O}(1)$ for insertion.

The actual problem I have is that insertions are not too frequent, but searching for which node comes first between two given nodes in the data structure is a common operation. This makes the naive $ \mathcal{O}(n)$ search pretty painful when dealing with a lot of nodes in the list as I have to search all the other nodes in the list at the worst case to determine which one is behind/ahead of the other.

My main roadblock is that since the user can insert them in any order (and it needs to stay in the order the user inserts them in), I have to use some data structure that maintains this invariant.

As such, with a linked list I am stuck currently at:

$ $ Earlier \in \mathcal{O}(n)$ $

and iterating over the list is of course $ \mathcal{O}(n)$ , along with removal being $ \mathcal{O}(1)$ since it’s trivial to unlink a node with a reference to it in a doubly linked list.

**My solution to the problem:**

Now, we *can* change the data structure if we want, so a linked list isn’t required. The only thing that is required is the ability to let the users iterate over the data structure and get the elements back in the order they placed them in.

This makes me think of whether there’s a tree structure I can use. For example, what if I was to take a binary tree that balances itself such that the search depth is approximately $ \mathcal{O}(\lg n)$ , maybe something like a self-balancing tree. The first thing that jumps to mind is an AVL tree where I’d track the sizes of the trees in balance and then update them. This isn’t quite an AVL tree since there’s no implicit ordering between the nodes, but the idea of self-balancing is something I’d like to exploit to get a good search runtime.

To make this viable, our users will have references to our nodes. This way we can put each node in a hash table and do an $ \mathcal{O}(1)$ lookup to find the node in the tree. Inserting a node before or after it is not too bad since you create a new subtree from the current node by adding a parent node and making the previous node into a leaf and adding the element as another leaf. To make this visually make sense:

` o o / \ add A / \ rebalance o o -------> o o ----------> ... / \ before X / \ if needed o X o o / \ A X `

or

` o o / \ add A / \ rebalance o o -------> o o ----------> ... / \ after X / \ if needed o X o o / \ X A `

where `o`

is another node (that is either a parent, or a leaf).

A consequence of this data structure is that it is a full binary tree and each leaf is a value we’re storing, and the parent nodes do not store any value.

The cost of adding a node to a self balancing binary search tree is $ \mathcal{O}(1)$ to place it at the node since we assume we can look up the node reference from a hash table, and then $ \mathcal{O}(1)$ to insert it by adding a parent and attaching the two nodes, and then $ \mathcal{O}(\lg n)$ to rebalance the tree. This means insertion is $ \mathcal{O}(\lg n)$ . Not too bad.

Searching for an element to see which comes earlier becomes a “traverse from both nodes up to the root and find the least common ancestor, and whichever comes from the left branch is earlier”, which is $ \mathcal{O}(\lg n)$ . Searching is now logarithmic as well.

As such, this means we now get:

$ $ Earlier \in \mathcal{O}(\lg n)$ $

Further, iterating over the binary tree is $ \mathcal{O}(n)$ since it’s a full binary tree and at worst there should be approximately $ 2n$ nodes to visit in total. Since the naive list solution previously was $ \mathcal{O}(n)$ , we’re looking good.

Finally, removal is probably the same as AVL tree removal and thus also $ \mathcal{O}(\lg n)$ .

**But can we do better?**

Overall the above solution is decent, but it would be really nice if I could knock the searching down to $ \mathcal{O}(1)$ if possible or something really small like how disjoint sets are $ \mathcal{O}(\alpha(n))$ for some operations and feel effectively constant.

Is it possible to do something like this in $ \mathcal{o}(\lg n)$ time? I am willing to trade away performance on addition, deletion, and iteration to get a better search time, as that is my bottleneck.

I don’t know what other data structures are out there, maybe you know. Or maybe there is some other method I can use that I don’t know about which would allow me to achieve very quick search times. I can augment the data structures, so that is always an option on the table.

I also understand that getting a better runtime might require going to the literature and implementing some exotic data structure, to which the cost of implementing and maintaining it may be more than it’s worth, and as such maybe the balancing binary tree might be the only viable solution since this is not Google-level data sizes and doesn’t need such a solution. Since this is a problem I have in a hobby project, I figure I can try out things with little repercussion.