This is the octave/matlab code I have that calculates the XOR [0 0 -> 0; 0 1 -> 1; 1 0 -> 1; 1 1 -> 0]. The problem is that it fails to generalize/converge and calculates a small error for [0 0] & [1 1] and a really big one for the other 2. What is the issue with the code or the algorithms implemented? This is my first time making an AI and it feels that I am missing out on something quite obvious. Thanks,

BTW, the graph at the bottom shows the error with (Yexpected-Yresult). I know that there are other methods for managing the error/cost such as sum(error^2)/2 but I’ve decided to keep it simple. Network topology: 2 (input layer) -> 3 (middle layer) -> 1 (output).

`clear graphics_toolkit("gnuplot") sigmoid = @(z) 1./(1 + exp(-z)); sig_der = @(y) sigmoid(y).*(1-sigmoid(y)); function [cost, mid_layer, last_layer] = forward(w1,w2,b1,b2,data,sigmoid,i) mid_layer = sigmoid(sum(data(1:2,i).*w1)'-b1); last_layer = sigmoid(sum(mid_layer.*w2)-b2); cost = data(3,i)-last_layer; end function [w1, w2, b1, b2] = backprop(w1,w2,b1,b2,mid_layer,last_layer,data,cost,sig_der,sigmoid,i) delta2 = (sig_der(last_layer,sigmoid)).*cost; delta1 = (sig_der(mid_layer,sigmoid)).*sum(delta2.*w2); w2 = w2 + 0.00001 .* mid_layer .* delta2; w1 = w1 + 0.00001 .* data(1:2,i) .* delta1'; b1 = b1 + 0.00001 .* delta1; b2 = b2 + 0.00001 .* delta2; end data(:,1)=[0; 0; 0]; data(:,2)=[1; 0; 1]; data(:,3)=[0; 1; 1]; data(:,4)=[1; 1; 0]; w1=rand(2,3); w2=rand(3,1); b1=ones(3,1); b2=ones(1,1); for j=1:20000 for i=1:4 [cost, mid_layer, last_layer] = forward(w1,w2,b1,b2,data,sigmoid,i); [w1, w2, b1, b2] = backprop(w1,w2,b1,b2,mid_layer,last_layer,data,cost,sig_der,sigmoid,i); cost_mem(j,i)=cost; end end toc `

The 2 graphs shows the error (sum(error^2)/2) for cases number 1 and 2.