I’m trying to determine the maximum memory consumption of the “pending nodes” data structure (stack/queue) for both travelings: BFS and (preorder) DFS.

Since BFS and DFS while traveling graphs have node discovery control (no loops), we can analyze the problem by thinking in terms of trees instead of graphs, where your starting node is taken as root, as usual.

I’ve started by assuming that the resulting traveling is a complete tree (all leafs have same depth), with height $ h$ , $ n$ nodes, and thus all nodes having degree $ d$ (except leafs).

Of course, since the tree is complete, $ n$ can be calculated from $ h$ and $ d$ :

$ $ n = \frac{1 – d^{h + 1}}{1 – d}$ $

Under this asumption, the worst case scenario for DFS is when you are in the deepest non-leaf node of the first branch of the tree: since each time you pop a node, you insert all of its children, for each level, you have $ D – 1$ pending nodes in the stack, except in the last non-leaf, where all of its children are inserted.

If you, instead of being in the first branch, were in another one, you would have some branch with less than $ d – 1$ pending children in the stack, and thus you have less nodes in the stack.

So the maximum memory consumption of DFS for a complete graph with homogeneus degree is ($ ms$ stands for `maximum space`

):

$ $ ms = h * (d – 1) + 1$ $

This last `+ 1`

represent the extra child for the last non-leaf node. For instance, for a tree with $ d = 4$ and $ h = 20$ nodes, a DFS’s stack would require as maximum:

$ $ ms_{DFS} = 20 * 3 + 1 = 81\ nodes$ $

Taking into account that this graph would have $ n = 1.4660155e^{12}$ nodes, this is a more than admisible amount. That’s the adventage of logarithmic space complexity ($ h = \lfloor log_d((d-1)n)\rfloor$ ).

However, for BFS, which have exponential space complexity, the worst case scenario is when you have all leafs pending to be discovered, while having discovered every other node, so your queue contains all leafs (so have the full last-level pending to be discovered but nothing else):

$ $ ms_{BFS} = d^h\ nodes$ $

which, from our example, equals $ 1.0995116e^{12}$ .

My problem now is relaxing the restriction of the graph to be complete, so $ d$ is not an homogeneus degree anymore, but an average degree instead (which can contain now decimals), and the tree can have any disbalance, even being a list. Consequently, the number of nodes is free, so it’s not attach to `d`

and `h`

as before.

If $ d$ is an average degree and $ n$ any amount of nodes, I’ve tried to model somehow an upper bound of space consumption by first modelling a complete tree with an homogeneus $ \lfloor d\rfloor$ degree, and then adding children to the last leaf until getting $ d$ (I assume that the resulting number of nodes should equal $ n$ , but I’m not sure on that either; I’ve even tried to calculate, given some $ d$ and $ n$ , a lower and upper bound for the height of the tree).

Since $ d$ is an average, if some node has more than $ d$ children is because some other node has less than $ d$ children, and thus the idea was to find the worst case scenario for DFS and BFS by removing children from one node and moving the cut branch as child of another node or, in general, finding somehow the closest upper bound as possible of memory consumption, but I couldn’t find a way.

The thing is that, if you apply this height increase repeatedly by moving sibling branches to deepest levels, you would probably end up having a lot parent or sibling paths consisting of just lists, which would be removed from the stack/queue very quickly, and thus there must be some tree state where you cannot make the space consumption worst than that. I assume though that the “worst tree” can be different from DFS and BFS though.

Is making this upper bound (or even exact amount) calculation even possible? Or the worst case scenario is just the balanced one?