What is the most efficient way to turn a list of directory path strings into a tree?

I’m trying to find out the most efficient way of turning a list of path strings into a hierarchical list of hash maps tree using these rules:

  • Node labels are delimited/split by ‘/’
  • Hash maps have the structure:
{     label: "Node 0",     children: [] } 
  • Node labels are also keys, so for example all nodes with the same label at the root level will be merged

So the following code:

[     "Node 0/Node 0-0",     "Node 0/Node 0-1",     "Node 1/Node 1-0/Node 1-0-0" ] 

Would turn into:

[     {         label: "Node 0",         children: [             {                 label: "Node 0-0",                 children: []             },             {                 label: "Node 0-1",                 children: []             },         ]     },     {         label: "Node 1",         children: [             {                 label: "Node 1-0",                 children: [                     {                         label: "Node 1-0-0",                         children: []                     },                 ]             },         ]     }, ] 

Why decision tree method for lower bound on finding a minimum doesn’t work

(Motivated by this question. Also I suspect that my question is a bit too broad)

We know $ \Omega(n \log n)$ lower bound for sorting: we can build a decision tree where each inner node is a comparison and each leaf is a permutation. Since there are $ n!$ leaves, the minimum tree height is $ \Omega(\log (n!)) = \Omega (n \log n)$ .

However, it doesn’t work for the following problem: find a minimum in the array. For this problem, the results (the leaves) are just indices of the minimum element. There are $ n$ of them, and therefore the reasoning above gives $ \Omega(\log n)$ lower bound, which is obviously an understatement.

My question: why does this method works for sorting and doesn’t work for minimum? Is there some greater intuition or simply "it just happens" and we were "lucky" that sorting has so many possible answers?

I guess the lower bound from decision tree makes perfect sense: we do can ask yes/no questions so that we need $ O(\log n)$ answers: namely, we can use binary search for the desired index. My question still remains.

Understanding the proof of “DFS of undirected graph $G$, yields either tree edge or back edge” better with graph for each statement in proof

I was going through the edge classification section by $ \text{DFS}$ algorithm on an undirected graph from the text Introduction to Algorithms by Cormen et. al. where I came across the following proof. I was having a little difficulty in understanding the steps of the proof and hence I made an attempt to understand it fully by accompanying each statement in the proof with a possible graph of the situation.

Theorem 22.10 : In a depth-first search of an un-directed graph $ G$ , every edge of $ G$ is either a tree edge or a back edge.

Proof:

  1. Let $ (u , v)$ be an arbitrary edge of $ G$ , and suppose without loss of generality that $ d[u] < d[v]$ . Then, $ v$ must be discovered and finished before we finish $ u$ (while $ u$ is gray), since $ v$ is on $ u$ ‘s adjacency list.

  2. If the edge $ (u, v)$ is explored first in the direction from $ u$ to $ v$ , then $ v$ is undiscovered (white) until that time.

Figure 1

Figure 1 : Situation in point 2. DFS starts from ‘u’, ‘u’ is grayed and DFS then looks along the red arrow to ‘v’

  1. Otherwise we would have explored this edge already in the direction from $ v$ to $ u$ . Thus, $ (u, v)$ becomes a tree edge.

Figure 2

Figure 2 : Situation in point 3. DFS starts from ‘u’, ‘u’ is grayed, then discoveres ‘w’ and ‘w’ is grayed and then again discovers ‘v’ DFS then looks along the red arrow to ‘u’ , the green pointer explains the rest

  1. If $ (u, v)$ is explored first in the direction from $ v$ to $ u$ , then $ (u, v)$ is a back edge, since $ u$ is still gray at the time the edge is first explored.

Figure 3

Figure 3 : Situation in point 4. DFS starts from ‘u’, ‘u’ is grayed, then discoveres ‘w’ and ‘w’ is grayed and then again discovers ‘v’ DFS then looks along the red arrow to ‘u’ , ‘u’ is already grayed so the edge becomes a back edge, indicated by the green pointer


Could anyone confirm if I am on the right track or if not please rectify me.

[My question might seem similar to this or this but neither of them seemed to help me]

Visualizing a directory structure as a tree map of rectangles

There’s this nice tool called WinDirStat which lets you scan a directory and view files in a rectangular tree map. It looks like this:

windirstat

The size of each block is related to the file size, and blocks are grouped by directory and coloured distinctly according to the top level directory. I’d like to create a map like this in Mathematica. First I get some file names in the tree of my Mathematica installation and calculate their sizes:

fassoc = FileSystemMap[FileSize, File[$  InstallationDirectory], Infinity, IncludeDirectories -> False]; 

Level Infinity ensures it traverses the whole tree. I could also add 1 to ensure the association is flattened, but I want the nesting so I can assign total sizes per directory.

I can find the total size which I’ll need to use to scale the rectangles:

QuantityMagnitude[Total[Cases[fassoc, Quantity[_, _], Infinity]], "Bytes"] 

My idea is to recursively apply this total. In theory I could use this to do something like this with a tree graph and weight the vertices by size, but I want to convert this into a map of rectangles like in WinDirStat. While the colour is obvious – each level 1 association and all its descendants gets a RandomColor[] – I’m not sure how I should go about positioning the rectangles in a Graphics. Any ideas?

Merkle tree sorting leaves and pairs

I am implementing a Merkle tree and am considering using either of the two options.

The first one is sorting only by leaves. This one makes sense to me since you would like to have the same input every time you are constructing a tree from the data, that might not arrive sorted by default.

       CAB       /    \      CA     \    /    \    \   C     A     B /  \  /  \  /  \ 1  2  3  4  5  6 

The second one is sorting by leaves and pairs, which means that after sorting the leaves, you also sort all the pairs after hashing them, however I’m not entirely sure about the benefits of this implementation (if any).

       ACB       /    \      AC     \    /    \    \   C     A     B /  \  /  \  /  \ 1  2  3  4  5  6 

I have seen these implementations of Merkle trees in the past but am not sure about their benefits. So why choose one over the other?

How to answer the following queries on a tree?

Given a tree of "N" nodes(each node has been assigned a value A[i],node-"1" is the root of the tree), and a constant "K" , we have Q queries of the following type : [w]

(which means find the lowest valued node in the sub-tree of [w] , only considering those nodes in the sub-tree of [w] which have a depth less than equal to K) .

Example :

Value of nodes of tree :

A[1] = 10

A[2] = 20

A[3] = 30

A[4] = 40

A[5] = 50

A[6] = 60

Edges of tree : [1-2],

[2-3],

[3-4],

[4-5],

[4-6].

K=2.

Query-1 : [w]=1 . All nodes in subtree of [w] : (1,2,3,4,5,6) , now, all nodes in sub-tree of [w] having depth less than equal to K : (1,2) . Hence , minimum(A[1],A[2])=min(10,20)=10 is the answer .

Query-2 : [w]=4 . All nodes in subtree of [w] : (4,5,6) , now, all nodes in sub-tree of [w] having depth less than equal to K : (4,5,6). Hence , minimum(A[4],A[5],A[6]) = min(40,50,60)=40 is the answer .

What is the maximal difference between the depths of 2 leaves in AVL tree?

I’m wondering what’s the answer of the following question:

What is the maximal difference between the depths of 2 leaves in an AVL tree?

Intuitively I think that it shouldn’t exceed $ log n$ but have no idea how to prove that. On the other hand, I saw multiple articles that claim the depth can become larger and larger without any formal proof (I always get a contradiction when trying to draw such trees)

I would really like if someone can explain me why it’s correct\incorrect.

Spanning tree in a graph of intersecting sets

Consider $ n$ sets, $ X_i$ , each having $ n$ elements or fewer, drawn among a set of at most $ m \gt n$ elements. In other words $ $ \forall i \in [1 \ldots n],~|X_i| \le n~\wedge~\left|\bigcup_{i=1}^n X_i\right| \le m$ $

Consider the complete graph $ G$ formed by taking every $ X_i$ as a node, and weighing every edge $ (i,j)$ by the cardinal of the symmetric difference $ X_i \triangle X_j$ .

An immediate bound on the weight of the minimal spanning tree is $ \mathcal{O}(n^2)$ , since each edge is at most $ 2 n$ , but can we refine this to $ \mathcal{O}(m)$ ?

For illustration, consider $ 2 p$ sets, $ p$ of which contain the integers between $ 1$ and $ p$ and $ p$ of which contain the integers of between $ p+1$ and $ 2p$ . A minimal spanning tree has weight $ p$ but a poorly chose tree on this graph would have weight $ (p-1)p$ . Intuitively, if there are only $ m$ values to chose from, the sets can’t all be that different from one another.

How is it possible for nodes at height $h$ in Tree $T$ to be at height $h-1$ at T’

I was searching for answers to the Question:

Show that there are at most $ \lceil n / 2^{h + 1} \rceil$ nodes of height $ h$ in any $ n$ -element heap.

Recently I asked a related question and found out the solution was flawed so I looked for another one.

So I took over to another answer and found this

It took over to prove by induction and it is quite understood on the first read except for the statement:

Note that the nodes at height $ h$ in $ T$ would be at height $ h − 1$ in tree $ T’$ .

Preface:

Let $ N_h$ be the number of nodes at height $ h$ in the n-node tree $ T$ . Consider the tree $ T’$ formed by removing the leaves of $ T$ .

Please help me clarify my doubts. Thank you.