From Computational Complexity A Modern Approach: Efficient Universal Turing Machine Theorem: There exists a TM U such that for every x, α ∈ {0, 1}∗, U(x, α) = Mα(x), where Mα denotes the TM represented by α. Furthermore, if Mα halts on input x within T steps then U(x,α) halts within CT log T steps, where C is a number independent of x and depending only on Mα’s alphabet size, number of tapes, and number of states.
From Kozen INDEXINGS OF SUBRECURSIVE CLASSES: "the class of polynomial time computable functions is often indexed by Turing machines with polynomial time counters…. The collection of all (encodings over (0, 1)) of such machines provides an indexing of PTIME…we have Theorem: No universal simulator for this indexing can run in polynomial space."
He then goes on to say: "can it be proved that gr U for any indexing of PTIME requires more than polynomial space to compute? We have proved this for a wide class of indexings, namely counter indexings satisfying the succinct composition property."

gr U is the graph of the universal function U and (barring details) represents the minimum power necessary to simulate P uniformly.

And the counter indexing (or polynomial time counters) he is referring to is specified in the answer here: How does an enumerator for machines for languages work?
I’m wondering how theorem for efficient universal Turing machine relates to Kozen’s result, that for certain types of enumerations of P, there is no machine that can efficiently simulate the machines enumerated. What causes simulation to be difficult and can it be circumvented–namely: does there exist an enumeration of P that describes the (machines for) languages in P in such a way that allows them to be efficiently simulated (with no more than poly overhead) on any input–or as Kozen puts it "allow(s) easy construction of programs from specifications"?
My guess is that part of the reason for the difficulty is because the efficient simulation theorem only says that there exists a machine that can efficiently simulate any single TM, but not a whole class of them… and when you start having to simulate more than one TM you loose the ability to optimize your simulator’s design to solve any particular language (run any particular machine) and have to design the simulator with the various machines you need to simulate in mind (and the more different those machines are the larger your overhead gets).
PS. A possible example could be in implicit complexity theory… where they construct languages that are complete for certain complexity classes. A language that is complete for P doesn’t seem to have trouble running its programs (which represent the languages in P)? But, if there are examples here, how do they overcome the difficulty Kozen is referring to, as they are programming systems and thus enumerations / indexings?
Just as a related aside… I think I have a proof that for a language Lp 1. and 2. cannot be true at the same time:

An enumeration of P, call it language Lp (whose members are the strings in the enumeration) is decidable in poly time.

All the P machines / languages represented by strings in Lp can be efficiently simulated by a universal simulator for P on any input.
It makes sense that there would be a relationship between the way the machines are encoded and the overhead for their simulation and 1. can be made to be true so that leaves 2. and brings us to the question being asked… Is it possible that 2. is always false–meaning for ANY enumeration/ encoding of P (any language Lp) simulation of those machines is not efficient for any universal simulator for P.
Here’s a rough sketch for anyone interested:
Take L:= {w∈L iff w∈Lp and W(w)=0}
So, one way to do this is our diagonal function maps w–>the language in P that w encodes (if w encodes a machine for a language in P (if w is a member of Lp)) and if it does not then it maps to the language containing all words. The existance of a map between a w and a language translates to w∈L iff w ‘is not a member’ of the language it is mapped to.
Since all w’s that aren’t encodings of P machines (members of Lp) are mapped to the language containing all words–they are members of L iff they are not members of this language. This is always false, so all words that are not members of Lp are not members of L.
This leaves only words of Lp as candidates for members of L. But for w to be a member of L not only does it need to be a member of Lp–the machine that w encodes, W, needs to evaluate to 0 on w. W(w)= 0 and w∈Lp <–> w∈L.
L is not in P. If L were in P then for some w, w would encode a machine for L, Wl. and for that w∈L iff Wl(w) = 0, ie. w∈L iff w is not in L.
Now, let’s employ the assumption that 1. Lp is poly decidable. As well as the assumption 2. that any machine specified by Lp can be simulated with no more than poly overhead by a universal simulator.
Then we can devise an algorithm, namely: given w, decide w∈Lp. If w∈Lp then run W(w). If w(w)=0 –> w∈L.
By the above algorithm, under these the assumptions 1. and 2. L would be in P–which is a contradiction to the previous proof by contradiction. I think the previous proof is correct and conclude that neither 1. nor 2. can be true at the same time.