To optimize my LF mapping, I was asked to do the following. Given a string, say $ abaxyxwxbx$ I need to encode it in a way where every index stores the value of the number of unique characters encountered since its last occurrence $ + 1$ and $ 0$ otherwise. Using the earlier string. The encoding would be:

$ abaxyxwxbx$

$ 2502020200$

For the first $ ‘a’$ , we encounter a $ ‘b’$ and then an $ ‘a’$ . So we count the one unique character, $ ‘b’$ and add $ 1$ to it and store it as the encoding for the first $ ‘a’$ . For the first $ ‘b’$ , we encounter the next $ ‘b’$ on index $ 8$ . From the first $ ‘b’$ to the second $ ‘b’$ , we encounter 4 unique letters $ (a, x, y, w)$ and add $ 1$ to it, hence we store $ 5$ as the encoding for the first $ ‘b’$ . For the second $ ‘a’$ , there is no $ ‘a’$ that we can encounter hence its encoding will be $ 0$ .

My first approach was to create a $ \sigma$ x 3 sized array ($ \sigma$ is the size of the alphabet). The first column would store the character, the second column would store the number of characters encountered since the last occurrence, and the third column would store the index of the last occurrence.

The second method I tested was to maintain an AVL tree with order statistic such that at a given time only $ \sigma$ characters could exist inside the tree. This method is again $ O(n*\sigma)$ .

Is there a way to do this is $ O(n)$ time?