How is readiness of instructions fetched from RAM signaled to the CPU?

In simple CPU architectures, such as the one discussed here, an instruction loaded from RAM is executed exactly one clock cycle after it is loaded into the instruction register.

However, I am told that in real/modern architectures it takes hundreds of clock cycles to fetch instructions from RAM (using caches in between). But that completely throws off my understanding of how things work. Could someone please outline the techniques that enable instructions ordered from RAM to get executed after some indefinite number of clock cycles in the future?

Please don’t just say e.g. "by stalling", but actually outline in terms of signals, chips, etc. how these things are organized. Thanks!