Is there a known way to make an efficient, compact, and fully persistent stack or queue?

In the world of mutable/ephemeral data structures and imperative programming languages, one of the classic ways to implement a stack or queue is to use array doubling: use mutation to fill up or empty an array, doubling or halving to expand/contract. Such stacks/queues have several nice properties:

  1. They use at most twice as much memory as strictly necessary.
  2. They involve minimal indirection.
  3. They use cache efficiently.
  4. They have amortized $ O(1)$ insertion and deletion operations.

In a purely functional system, this approach falls down quite flat, because "mutate the array to fill/empty" becomes very expensive: the array has to be copied each time. I was wondering if there might be some reasonable compromise approach, making something more compact than classical approaches (a la Okasaki) but still with constant amortized time operations. Stacks are simpler, so I started thinking about those. My first attempt (in Haskell notation) was

data Queue a   = Shallow !(Array a)   | Deep !(Array a) (Queue a) 

with the rule that the array at depth $ n$ must have either $ 0$ or $ 2^n$ elements. Unfortunately, this doesn’t look like it’s nearly good enough. It appears that insertions impose an $ O(\log n)$ amortized cost, since flipping from a 1 digit to a 0 digit gets more expensive the deeper it happens in the tree. My next attempt was the same, but using skew binary numbers instead of binary numbers. Same deal. Is there some trick I’m missing, or am I asking to have my cake and eat it too?