How does the BERT model (in Tensorflow or Paddle-paddle frameworks) relate to nodes of the underlying neural-net that’s being trained?

The BERT model in frameworks like TensorFlow/Paddle-paddle shows various kinds of computation nodes (like subtract, accumulate, add, mult etc) in a graph like form in 12 layers.

But this graph doesn’t look anything like a neural-network, one that’s typically shown in textbooks (e.g. like this https://en.wikipedia.org/wiki/Artificial_neural_network#/media/File:Colored_neural_network.svg) where each edge has a weight that’s being trained and there is an input layer and output layer.

Instead, when I print out the BERT graph, I can’t figure out how a node in the BERT graph relates to a node in the neural-network that’s being trained.

I have been using the BERT framework models to compile them to a form where we can run the model on a PC/CPU. But I still lack this basic aspect of how BERT relates to neural-net as I don’t see which neural-network topology is being trained (as i’d expect topology/connections between/among various layers/nodes of the neural-net dictate how training of the neural net occurs).

Could someone explain what underlying neural-net is being trained by BERT? How do nodes in the BERT graph relate to neural-net nodes and weights on neural-net edges?

Proof that an almost complete binary tree with n nodes has at least $\frac{n}{2}$ leaf nodes

I’m having some trouble proving what my title states. Some textbooks refer to almost complete binary trees as complete, so to make myself clear, when I say almost complete binary tree I mean a binary tree whose levels are all full except for the last one in which all the nodes are as far left as possible.

I thought of proving it using induction but I’m not really sure how to do so. Any ideas?

Determine whether there exists a path in a directed acyclical graph that reaches all nodes without revisiting a node

For this I came up with a DFS recursion.

Do DFS from any node and keep doing it until all nodes are Exhausted. I.E. pick the next unvisited node once you cant keep recursing.

The element with the highest post number or the last element you visit should be the first element in your topological ordering.

Now do another DFS recursion that executes on every node called DFS_find:

DFS_find(Node): if (node has no neighbors): return 1; otherwise: return 1 + the maximum of DFS_find(Node) for all neighboring nodes

Execute DFS_find(Node) on the first node in your topological ordering. If it returns a number equal to the number of vertices, then a directed path that crosses every node once, exists. Otherwise it does not.

How can I prove whether or not this algorithm is correct?

I think this may be a little less time efficient than the classical way to just do a topological sort and then check if each consecutive pair has an edge between them.

Algorithm to delete BST nodes with duplicated values

For BSTs we have:

  1. Greater values go to the right
  2. Smaller or EQUAL values go to the left

All the algorithms I found to delete a node say to find the smallest node in the right subtree of the node we want to delete, successor, (or the greatest node in the left subtree, predecessor), and replace the deleted node for the successor.

But what happens when we have BST nodes with the same value?

     4    /   \   2     5 <-- Delete 5  / \   / \ 1  3  5   7          / \        [7]  9 <-- 7 will take 5's place 

Now we end up with:

     4    /   \   2     7   / \   / \ 1  3  5  [7]  <-- This 7 should be in the left subtree            \             9 

The BST will still work (or not?), but the definition of BST won’t fit this resultant tree anymore.

Is there another algorithm that takes this into account or is this result expected and generally accepted?

How Dijkstra’s algorithm forms shortest path tree when multiple nodes have same shortest path length

I came across following problem:

Consider below graph:
enter image description here
What will be the shortest path tree starting with node $ A$ returned by Dijkstra’s algorithm, if we assume priority queue is implemented as binary min heap?

My solution:

I referred Dijkstra from CLRS:
enter image description here

With A as a starting node, we will have priority queue bin min heap as follows:

A{root, 0}     | Rest all:{∅,∞} 

(Notation: {parent in SPT, shortest path weight})

It will extract A from priority queue and add it to SPT and will relax AC and AB:

    B{A:5}      /  \ C{A:6}  Rest all:{∅,∞}  

It will extract B from priority queue and and add it to SPT:

   C{A:6}       | Rest all:{∅,∞}  

and will relax BE:

            C{A:6}              /   \ Rest all:{∅,∞}   E{B,6} 

Next it will extract C and so one. Thus the SPT will be:

enter image description here

But not:

enter image description here

Q1. Am I correct with above?
Q2. CLRS algo does not dictate which node to add to SPT if multiple of them have same shortest path weight. Hence its dependent on how priority queue is implemented. If no information was given about how priority queue was implemented then we cannot tell how SPT will be formed. Am I right?

A data structure for efficiently finding nodes relative to other (ex: if a node is earlier in a list than another node)

Suppose we have N elements, which we’ll treat as a simple object. Is there a data structure I can use that will allow me to see which node appears earlier based on some arbitrary insertion order given a reference to both nodes? I’ll need some kind of data structure $ \mathcal{D}$ , where $ \mathcal{D}$ supports the following operations (assuming $ A$ and $ B$ are node references, and $ A \neq B$ ):

$ \mathcal{D}.addBefore(A, B)$

$ \mathcal{D}.addAfter(A, B)$

$ \mathcal{D}.addLast(A)$

$ \mathcal{D}.earlier(A, B) \rightarrow bool$

$ \mathcal{D}.remove(A)$

I’d like to implement some kind of $ Earlier$ predicate which takes two nodes and returns whether A comes before B. For example, if this was an indexed list and we had the index of the nodes, then it’d be simply:

$ $ Earlier(A, B) \implies A.index < B.index$ $

The ordering is determined by a user who inserts them as they see fit. They are allowed to add either after some node, or before some node, or if the data structure is empty then they can just add it and the element that was added first becomes the only element in the data structure until another element is added.

A practical example of this problem is that a user is pasting files into a directory, but the file explorer lets the user paste files before or after any file in the list. The file explorer must display the files in order that the user requests, so if a list is used to hold the files, then [A, B, C] should render as [A, B, C], and if the user pastes a file D before B, then the list should render [A, D, B, C].

This becomes a problem when I need to insert before another item: I don’t have that luxury since inserting into the middle of a list backed by an array has a big overhead. My next thought was to go with a linked list because I will have references to the two nodes and can quickly insert with my handle to the node. This is $ \mathcal{O}(1)$ for insertion.

The actual problem I have is that insertions are not too frequent, but searching for which node comes first between two given nodes in the data structure is a common operation. This makes the naive $ \mathcal{O}(n)$ search pretty painful when dealing with a lot of nodes in the list as I have to search all the other nodes in the list at the worst case to determine which one is behind/ahead of the other.

My main roadblock is that since the user can insert them in any order (and it needs to stay in the order the user inserts them in), I have to use some data structure that maintains this invariant.

As such, with a linked list I am stuck currently at:

$ $ Earlier \in \mathcal{O}(n)$ $

and iterating over the list is of course $ \mathcal{O}(n)$ , along with removal being $ \mathcal{O}(1)$ since it’s trivial to unlink a node with a reference to it in a doubly linked list.


My solution to the problem:

Now, we can change the data structure if we want, so a linked list isn’t required. The only thing that is required is the ability to let the users iterate over the data structure and get the elements back in the order they placed them in.

This makes me think of whether there’s a tree structure I can use. For example, what if I was to take a binary tree that balances itself such that the search depth is approximately $ \mathcal{O}(\lg n)$ , maybe something like a self-balancing tree. The first thing that jumps to mind is an AVL tree where I’d track the sizes of the trees in balance and then update them. This isn’t quite an AVL tree since there’s no implicit ordering between the nodes, but the idea of self-balancing is something I’d like to exploit to get a good search runtime.

To make this viable, our users will have references to our nodes. This way we can put each node in a hash table and do an $ \mathcal{O}(1)$ lookup to find the node in the tree. Inserting a node before or after it is not too bad since you create a new subtree from the current node by adding a parent node and making the previous node into a leaf and adding the element as another leaf. To make this visually make sense:

    o                   o    / \     add A       / \      rebalance   o   o   ------->    o   o    ---------->  ...  / \      before X   / \        if needed o   X               o   o                        / \                       A   X 

or

    o                   o    / \     add A       / \      rebalance   o   o   ------->    o   o    ---------->  ...  / \      after X    / \        if needed o   X               o   o                        / \                       X   A 

where o is another node (that is either a parent, or a leaf).

A consequence of this data structure is that it is a full binary tree and each leaf is a value we’re storing, and the parent nodes do not store any value.

The cost of adding a node to a self balancing binary search tree is $ \mathcal{O}(1)$ to place it at the node since we assume we can look up the node reference from a hash table, and then $ \mathcal{O}(1)$ to insert it by adding a parent and attaching the two nodes, and then $ \mathcal{O}(\lg n)$ to rebalance the tree. This means insertion is $ \mathcal{O}(\lg n)$ . Not too bad.

Searching for an element to see which comes earlier becomes a “traverse from both nodes up to the root and find the least common ancestor, and whichever comes from the left branch is earlier”, which is $ \mathcal{O}(\lg n)$ . Searching is now logarithmic as well.

As such, this means we now get:

$ $ Earlier \in \mathcal{O}(\lg n)$ $

Further, iterating over the binary tree is $ \mathcal{O}(n)$ since it’s a full binary tree and at worst there should be approximately $ 2n$ nodes to visit in total. Since the naive list solution previously was $ \mathcal{O}(n)$ , we’re looking good.

Finally, removal is probably the same as AVL tree removal and thus also $ \mathcal{O}(\lg n)$ .


But can we do better?

Overall the above solution is decent, but it would be really nice if I could knock the searching down to $ \mathcal{O}(1)$ if possible or something really small like how disjoint sets are $ \mathcal{O}(\alpha(n))$ for some operations and feel effectively constant.

Is it possible to do something like this in $ \mathcal{o}(\lg n)$ time? I am willing to trade away performance on addition, deletion, and iteration to get a better search time, as that is my bottleneck.

I don’t know what other data structures are out there, maybe you know. Or maybe there is some other method I can use that I don’t know about which would allow me to achieve very quick search times. I can augment the data structures, so that is always an option on the table.

I also understand that getting a better runtime might require going to the literature and implementing some exotic data structure, to which the cost of implementing and maintaining it may be more than it’s worth, and as such maybe the balancing binary tree might be the only viable solution since this is not Google-level data sizes and doesn’t need such a solution. Since this is a problem I have in a hobby project, I figure I can try out things with little repercussion.

How to get enclosed spaces from a series of connected nodes

I have a bunch of connected walls in a list and the data for them is like so:

Wall {    Node A;    Node B; }  Node  {     float x;     float y;  } 

I want to find the rooms from the connected walls as an array of connected points to represent each room’s perimeter.

This is an example visually of what i am trying to find: enter image description here

The red dots are the nodes, and the lines are the walls, the numbers are the identified rooms that the walls created.

The walls can be at any angle, not sure if that matters though.

I am wondering what algorithms exist that can help me solve this problem, what is the best way to approach this?

Assigning values to nodes and edges a tree to maximize node whose value is larger than all adjacent edges

A node is valid if its value is greater than all of its adjacent edges.

Task is to maximize the number of valid nodes.

Given $ n$ values for nodes and $ n-1$ values for edges, how do I assign these values (to nodes and edges) to a given input tree so the number of valid nodes is maximized?

Why IDA* Is Faster Than A* But Why IDA* Visit More Nodes That A*?

I used ida* for 8 puzzle and my friends used a* for it too ( with same manhattan distance huristic ).

I calculate average of my algorithm for 20 examples and my friend’s algorithm , The time average for my algorithm was very faster than my friend’s algorithm but mine average nodes that visited is alot more than my friend’s.

I know IDA* visits each node more than once but why it is faster than A* ?

Modifying insert and remove functions of an AVL tree so that nodes that don’t need to be rebalanced are not checked for balance

Trying to modify an insert and remove function for an AVL Tree so that no nodes are checked for balance that do not need to be. The suggested way to do was was to change the return types of insert, remove and balance, so that they return information about whether more balance checking is needed. In the case of insert, we know that after one node is rebalanced, no other node will need rebalancing. We can let balance(t) return true if the node t was rebalanced and false otherwise. Then insert should also return a bool value to notify nodes further up the tree that no more rebalancing is required.

 void insert( const Comparable & x, AvlNode * & t )     {         if( t == nullptr )             t = new AvlNode{ x, nullptr, nullptr };         else if( x < t->element )             insert( x, t->left );         else if( t->element < x )             insert( x, t->right );          balance( t );     }  void remove( const Comparable & x, AvlNode * & t )     {         if( t == nullptr )             return;   // Item not found; do nothing          if( x < t->element )             remove( x, t->left );         else if( t->element < x )             remove( x, t->right );         else if( t->left != nullptr && t->right != nullptr ) // Two children         {             t->element = findMin( t->right )->element;             remove( t->element, t->right );         }         else         {             AvlNode *oldNode = t;             t = ( t->left != nullptr ) ? t->left : t->right;             delete oldNode;         }          balance( t );     }   void balance( AvlNode * & t )     {         if( t == nullptr ) return;          cout << "balancing <" << height(t->left) << "> " << t->element << " <" << height(t->right) << ">" << endl ;          if( height( t->left ) - height( t->right ) > ALLOWED_IMBALANCE )             if( height( t->left->left ) >= height( t->left->right ) )                 rotateWithLeftChild( t );             else                 doubleWithLeftChild( t );         else         if( height( t->right ) - height( t->left ) > ALLOWED_IMBALANCE )             if( height( t->right->right ) >= height( t->right->left ) )                 rotateWithRightChild( t );             else                 doubleWithRightChild( t );          t->height = max( height( t->left ), height( t->right ) ) + 1;     }  

I am stuck on how to do this. Would I be adding “return true” after any of the following calls: rotateWithLeftChild( t ), doubleWithLeftChild( t), rotateWithRightChild( t ), doubleWithRightChild( t ) and false otherwise? If someone could point me towards the right direction, it would be appreciated