This happens to me from time to time: I compile my code with the highest optimization level (
-Ofast) of the allegedly fastest compiler (GCC) of one of the fastest languages (C/C++). It takes 3 seconds. I run the compiled program, measuring performance. Then I make some trivial change (say, marking a function
inline), compile it again, and it runs 20% faster.
Why? Often I’d rather wait a few minutes or even hours, but be sure that my code is at least hard to optimize further. Why does the compiler give up so quickly?
As far as I know modern architectures are super complicated and hard to a priori optimize for. Couldn’t a compiler test many possibilities and see which one is the fastest? I effectively do this by making random changes in the source code, but that doesn’t sound optimal.