How to improve and accelerate this iteration 2 D algorithm?

I implemented the algorithm descussed here in the 1 D case (when $ g$ is one variable function). Using a relaxation parameter the algorithm became very fast and converges in few iterations (n=5,6,8..) for smooth functions. After this I tried to implement it for 2 D functions, which is really what I need, but It became very slow. I think the problem is in double integration which take a long time. I tried to use Aitken method to accelerate the process but doesn’t work. Also Alex here used Gauss quadrature formula to perform the integration but still not working for me in a good way.

I want to know why exactly the algorithm is too slow in 2 D case, and If some one have any idea to improve the code. May be someone used such algorithm onece.

The basic code is as follows (may be the one from Alex is better)

f[x_, y_] := x*(1 - x); (* The exact source term to be constructed *)  nsoleq = NDSolveValue[{D[u[t, x, y], t] ==   D[u[t, x, y], x, x] + D[u[t, x, y], y, y] + f[x, y],  u[0, x, y] == 0, u[t, x, 0] == 0, u[t, 0, y] == 0,  u[t, x, 1] == 0, u[t, 1, y] == 0},  u, {t, 0, 1}, {x, 0, 1}, {y, 0,  1}]; (* nsoleq[1,x,y] is the observation used in construction of \ the source term *)  (*The iteration is as follows g[0]=0 for example, g[i]=g[i-1]+a[i]*p[i] p[i] is the gradient descente of the Conjugate gradient method (we \ need to solve to pdes to calculate it) a[i] is a relaxation parameter  *) g[0][x_, y_] := 0; n = 20; (* Initialization of the iteration *) Do[nsol[i] =   NDSolveValue[{D[u[t, x, y], t] ==      D[u[t, x, y], x, x] + D[u[t, x, y], y, y] + g[i - 1][x, y],    u[0, x, y] == 0, u[t, x, 0] == 0, u[t, 0, y] == 0,    u[t, x, 1] == 0, u[t, 1, y] == 0},   u, {t, 0, 1}, {x, 0, 1}, {y, 0, 1}]; (*  Solve the direct equation *) nasol[i] =  NDSolveValue[{D[v[t, x, y], t] ==     D[v[t, x, y], x, x] + D[v[t, x, y], y, y],    v[0, x, y] == nsol[i][1, x, y] - nsoleq[1, x, y],    v[t, x, 0] == 0, v[t, 0, y] == 0, v[t, x, 1] == 0,    v[t, 1, y] == 0}, v, {t, 0, 1}, {x, 0, 1}, {y, 0, 1},   Method -> {"MethodOfLines",     "SpatialDiscretization" -> {"TensorProductGrid",       "MinPoints" -> 5*15 + 1, "MaxPoints" -> 5*15 + 1,       "DifferenceOrder" -> Automatic}}]; (* Solve the adjoint equation *) (*p[i]=N[Integrate[nasol[i][t,x,y],{t,0,1}],10]+0.000001*g[i-1][x, y]; *) b = N[Integrate[nasol[i][t, x, y], {t, 0, 1}], 10] +   0.000001*g[i - 1][x, y]; p[i] =  Interpolation[  Flatten[Table[{{x, y}, b}, {x, 0, 1, .1}, {y, 0, 1, .1}], 1]]; nsol1[i] =  NDSolveValue[{D[u[t, x, y], t] ==     D[u[t, x, y], x, x] + D[u[t, x, y], y, y] + p[i][x, y],    u[0, x, y] == 0, u[t, x, 0] == 0, u[t, 0, y] == 0,    u[t, x, 1] == 0, u[t, 1, y] == 0},   u, {t, 0, 1}, {x, 0, 1}, {y, 0,    1}]; (*This is for calculating the parameter of iterarion a[ i] for accelerating the convergence *) a[i] = (NIntegrate[(p[i][x, y])^2, {x, 0, 1}, {y, 0, 1}])/(N[    NIntegrate[(nsol1[i][1, x, y])^2, {x, 0, 1}, {y, 0, 1}],     10]);(*The parameter of iterarion,  In this division we have many problems with NIntegrate,  especially when we increase the number of iteration *) g[i] =  Interpolation[  Table[{{x, y}, g[i - 1][x, y] - a[i]*(p[i][x, y])}, {x, 0,      1, .1}, {y, 0, 1, .1}]~Flatten~1]; (*Iteration *)   , {i, 1, n}]; // AbsoluteTiming 

{128.52590416078385`, Null}

 {Plot3D[f[x, y], {x, 0, 1}, {y, 0, 1}, PlotLabel -> "Exact Solution"],   Plot3D[g[n][x, y], {x, 0, 1}, {y, 0, 1},   PlotLabel -> "Constructed Solution", PlotRange -> All]} 

enter image description here enter image description here

Accelerate nested sum

I have a expression that is shown below. Basically all tables like mxTable, bTable and bulkTable have been pre-calculated.

enter image description here

The original expression looks like: enter image description here

Here are the definitions of tables:

quantity[n1_, n2_, n3_, n4_] :=    quantity[n3, n4, n1, n2] =     quantity[n1, n4, n3, n2] =      quantity[n3, n2, n1, n4] = mxtable[[n1, n2, n3, n4]]*\!\( \*UnderoverscriptBox[\(\[Sum]\), \(m1 = 0\), \(n1 - 1\)]\( \*UnderoverscriptBox[\(\[Sum]\), \(m2 = 0\), \(n2 - 1\)]\( \*UnderoverscriptBox[\(\[Sum]\), \(m3 = 0\), \(n3 - 1\)]\( \*UnderoverscriptBox[\(\[Sum]\), \(m4 = 0\), \(n4 -              1\)]\((bTable[\([n1 + 2, m1 + 3]\)] bTable[\([n2 + 2,                m2 + 3]\)] bTable[\([n3 + 2,                m3 + 3]\)] bTable[\([n4 + 2, m4 + 3]\)] bulk[\([m1 + 1,                m2 + 1, m3 + 1, m4 + 1]\)])\)\)\)\)\);    mxTable =    Table[1.0/     Sqrt[n1 (n1 + 1) n2 (n2 + 1) n3 (n3 + 1) n4 (n4 + 1)], {n1, 1,      50}, {n2, 1, 50}, {n3, 1, 50}, {n4, 1, 50}];  bTable = Table[Binomial[a, b], {a, 0, 250}, {b, 0, 250}];  ratioTable = Table[N[(m24 + 1)/2^(m13 + 3), 300] (-1)^(m13 + m24) (\!\( \*UnderoverscriptBox[\(\(\[Sum]\)\(\ \ \)\), \(k = 0\), \(m24 +           1\)] \(( \*FractionBox[\(1\), \(12\)] N[ \*FractionBox[\(Binomial[m13 + k + 4, k + 4]\),  SuperscriptBox[\(2\), \(k\)]],            300] \((k + 4)\) \((k + 3)\) \((k + 2)\) \((k +             1)\))\)\) + \!\( \*UnderoverscriptBox[\(\(\[Sum]\)\(\ \ \)\), \(l = 0\), \(m24 +           3\)] \((N[ \*FractionBox[\(Binomial[m13 + l + 2, l + 2]\),  SuperscriptBox[\(2\), \(l\)]],            300] \((l + 2)\) \((l + 1)\) \((m24 + 3)\) \((m24 +             2)\))\)\)), {m13, 0, 100}, {m24, 0, 100}];  bulkTable = Table[    bTable[[m1 + m3 + 1, m1 + 1]] bTable[[m2 + m4 + 1,        m2 + 1]] (ratioTable[[m1 + m3 + 1, m2 + m4 + 1]] +        ratioTable[[m2 + m4 + 1, m1 + m3 + 1]]), {m1, 0, 50}, {m2, 0,      50}, {m3, 0, 50}, {m4, 0, 50}]; 

Currently I would like to calculate a table of quantity with each argument from 1 to 50. i.e. a 50x50x50x50 table, but it takes a long time, partly because numbers in bTables are very large and also high precision is needed (with 300+ decimal places). Does anyone have a good idea how to accelerate the code, except for writing it in another language like cpp?