Say you have a neural net that is being trained using back propagation and you are using relu activation. Say the input to a node is a weighted sum of the previous layer with a bias term and say for a particular data point, this weighted sum plus bias is negative. Then relu returns 0. Notice the change in the loss as a function of the change in one of these weights or the bias is 0. Therefore the network won’t improve the bias as the network does back propagation. Why is this not a problem?