I’m having a hard time understand the reasoning in the solution of 18.7 in Elements of programming interviews (EPI):

Let $ s$ and $ t$ be stings and $ D$ a dictionary, i.e., a set of strings. Define $ s$ to *produce* $ t$ if there exists a sequence of strings from the dictionary $ P = \langle s_0,s_1,\ldots,s_{n-1} \rangle$ such that the first string is $ s$ , the last string is $ t$ , and adjacent strings have the same length and differ in exactly one character. The sequence $ P$ is called a *production sequence*. For example, if the dictionary is {bat,cot,dog,dag,dot,cat}, then ⟨cat,cot,dot,dog⟩ is a valid production sequence.

Given a dictionary $ D$ and two strings $ s$ and $ t$ , write a program to determine if $ s$ produces $ t$ . Assume that all characters are lowercase alphabets. If $ s$ does produce $ t$ , output the length of a shortest production sequence; otherwise, output -1.

*Hint:* Treat strings as vertices in an undirected graph, with an edge between $ u$ and $ v$ if and only if the corresponding strings differ in one character.

Here is the solution:

The number of vertices is $ d$ , the number of words in the dictionary. The number of edges is, in the worst-case $ O(d^2)$ . The time complexity is that of BFS, namely $ O(d+d^2) = O(d^2)$ . If the string length $ n$ is less than $ d$ then the maximum number of edges out of a vertex is $ O(n)$ , implying an $ O(nd)$ bound.

So I agree with number of vertices being $ d$ , and the worst case number of edges is $ d^2$ . We know that the complexity of BFS is $ O(V+E)$ , hence $ O(d+d^2) = O(d^2)$ , though if we used a set to record visited vertices (or removed a vertex once we visited it from the graph), that should reduce BFS’s complexity to $ O(d)$ . But then things get funky.

If the string length $ n$ is less than $ d$ then the maximum number of edges out of a vertex is $ O(n)$ .

don’t agree with this. Imagine we have 5 words in our dictionary $ \{ab, ac, ad, ae, af\}$ , so $ d=5$ and $ n=2$ . All these vertices are connected and you can see that each vertex has 4 edges leaving it… which is more than $ O(n)$ . You can have $ 26^n$ possible edges leaving the vertex, but you only have $ d$ vertices in the graph, so the number of edges leaving a single vertex should be $ O(d)$ .

I ultimately agree that the final complexity of the algorithm is $ O(nd)$ , but I calculated that simply given that we can visit up to $ d$ vertices (we use a visited set to prevent cycles) and for each vertex visited one we iterate over the string of length $ n$ as we look for differences in the alphabet of lower characters $ O(26nd) = O(nd$ ).

Interested to hear what people think, Thanks 🙂