How can divide by 2 blocksize bubblesort followed by a final mergesort be optimized in a particular environment?

I am wondering if we had a large array to sort (let’s say 1,048,576 random integers), chosen because it is a perfect power of 2, if we can just keep dividing those blocks into smaller and smaller half size blocks, how would someone know (on a particular computer using a particular language and complier) what the ideal blocksize would be to get the best actual runtime speed using mergesort to put them all back together? For example, what if someone had 1024 sorted blocks of size 1024, but it could that be beaten by some other combination? Is there anyway to predict this or someone has to just code them and try them all and pick the best? Perhaps for simplicity they would want to use some simple bubblesort on the 1024 size blocks, then merge them all together at the end using mergesort. Of course the mergesort portion would only work on 2 sorted blocks at a time, merging them into 1 larger sorted block.

Also, what about the time complexity analysis on something like this? Would all divide and conquer variations of this be of the same time complexity? The 2 extremes would be 2 sorted blocks (of size 524,288) or 1,048,576 “sorted” blocks of size 1, handed over to a merge process at that point.