I think the problem lies within a bad implementation of back propagation, it’s the only way i could explain this (gradient checking that doesn’t match backprop derivatives), but i’m not able to find any error, and derivatives are ok to me, so i ask you if the derivatives i compute make sense, or there is some error.

**About DNN:**

I built a 3 layer (1 input, 1 hidden, 1 output) neural network. My goal was regression, so the last layer has 1 neuron, and I used leakyRelu as activation function on the hidden layer, and no activation function on the output layer. I used Mean Squared Error (MSE) as cost function. I also used normalization, and i didn’t use regularization yet.

**About derivatives:**

*dCost/dALayerOutput* = d( 1/m * sum( (ALayerOutput – y)^2 ) )/dALayerOutput = 2/m * (ALayerOutput – y) = **dA**

*dCost/DZLayerOutput* = dA * dALayerOutput/dZLayerOutput = dA * d(ZLayerOutput) = dA * 1 = **dZ** (because i don’t apply any activation function to the last layer).

*dCost/DWeightOutput* = dZ * dZLayerOutput/dWeightOutput = dZ * d(WeightOutput * ALayerHidden + BiasOutput)/dWeightOutput = dZ * ALayerHidden = **dW**

*dCost/DBiasOutput* = dZ * dZLayerOutput/dBiasOutput = dZ * d(WeightOutput * ALayerHidden + BiasOutput)/dBiasOutput = dZ * 1 = dZ = **dB**

*dCost/DALayerHidden* = dZ * dZLayerOutput/dALayerHidden = dZ * d(WeightOutput * ALayerHidden + BiasOutput)/dALayerHidden = dZ * WeightOutput = **dA-1**

*dCost/dZLayerHidden* = dA-1 * dALayerHidden/dZLayerHidden = d( leakyRelu(ZLayerHidden) )/dZLayerHidden = dA-1 * dLeakyRelu(dZLayerHidden) = **dZ-1**

*dCost/dWeightHidden* = dZ-1 * dZLayerHidden/dWeightHidden = dZ-1 * d(WeightHidden * LayerInput + BiasHidden)/dWeightHidden = dZ-1 * LayerInput = **dW-1**

*dCost/dBiasHidden* = dZ-1 * d(WeightHidden * LayerInput + BiasHidden)/dBiasHidden = dZ-1 * 1 = **dB-1**

**About Gradient Descent:**

netWeightsLayerOutput = netWeightsLayerOutput – (learningRate * dW)

netWeightsLayerHidden = netWeightsLayerHidden – (learningRate * dW-1)

netBiasesLayerOutput = netBiasesLayerOutput – (learningRate * dB)

netBiasesLayerHidden = netBiasesLayerHidden – (learningRate * dB-1)

**Did you find any error?**