The BERT model in frameworks like TensorFlow/Paddle-paddle shows various kinds of computation nodes (like subtract, accumulate, add, mult etc) in a graph like form in 12 layers.
But this graph doesn’t look anything like a neural-network, one that’s typically shown in textbooks (e.g. like this https://en.wikipedia.org/wiki/Artificial_neural_network#/media/File:Colored_neural_network.svg) where each edge has a weight that’s being trained and there is an input layer and output layer.
Instead, when I print out the BERT graph, I can’t figure out how a node in the BERT graph relates to a node in the neural-network that’s being trained.
I have been using the BERT framework models to compile them to a form where we can run the model on a PC/CPU. But I still lack this basic aspect of how BERT relates to neural-net as I don’t see which neural-network topology is being trained (as i’d expect topology/connections between/among various layers/nodes of the neural-net dictate how training of the neural net occurs).
Could someone explain what underlying neural-net is being trained by BERT? How do nodes in the BERT graph relate to neural-net nodes and weights on neural-net edges?