Optimising tensor operations under memory constraints

Let riem is a free variable with riem ∈ Arrays[{4, 4, 4, 4} as assumption. Let:

val = TensorConstract[TensorProduct[riem, riem, riem], {{4,5}}]

Let riemVals be an actual {4, 4, 4, 4} tensor whose indices have symbolic values.

I’m interested in computing val /. (riem-> riemVals). I’m “guessing” there are two ways Mathematica could do this internally:

1) Compute v1 =TensorProduct[riemVals, riemVals, riemVals] then compute the result as TensorConstract[v1,{{4,5}}].

2) Note that val is equivalent to:

TensorProduct[TensorConstract[TensorProduct[riem, riem], {{4,5}}],riem].

Compute v1= TensorProduct[riemVals, riemVals]. Then v2= TensorConstract[v1,{{4,5}}]. Then the result as TensorProduct[v1, riemVals].

Now, what’s the difference between these two? Obviously they give us the same result, but in the first approach we have to store a $ 4^{12}$ tensor in memory as intermediate value, while in the second one we only have to store a $ 4^{10}$ tensor. The idea being that, when your maximum memory is constrained, it pays off to move TensorConstract inwards to the expression so you can do it as earlier as possible before you do the TensorProduct.

My question is: does Mathematica take the memory-efficient approach when doing these types of operations? If not, is there any way to implement the evaluation/computation in a controlled manner such that the result will be calculated in a memory-efficient way (compute and prioritize forms where the TensorCotract is made early)?