Equivalence from multi-tape to single-tape implies limited write space?

Suppose I have the following subroutine, to a more complex program, that uses spaces to the right of the tape:

$ A$ : “adds a $ at the beginning of the tape.”

So we have: $ $ \begin{array}{lc} \text{Before }: & 0101\sqcup\sqcup\sqcup\ldots\ \text{After }A: & $ 0101\sqcup\sqcup\ldots \end{array} $ $

And it is running on a multi-tape turing machine. The equivalence theorem from Sipser book proposes we can describe any multi-tape TM with a single-tape applying the following “algorithm”, append a $ \#$ at every tape and then concat every tape in a single tape, and then put a dot to simulate the header of the previous machine, etc, etc.

With $ a$ and $ b$ being the content of other tapes, we have: $ $ a\#\dot{0}101\#b $ $ If we want to apply $ A$ or another subroutine that uses more space, the “algorithm” described in Sipser is not enough, I can intuitively shift $ \#b$ to the right, but how to describe it more formally for any other subroutine that uses more space in the tape? I can’t figure out a more general “algorithm” to apply in this cases.

Simulating multi-tape turing machines by single-tape TM

In “Computational Complexity: A modern approach”, Arora and Barak proof the following Claim:

Define a single-tape Turing machine to be a TM that has only one read-write tape, that is used a input, work, and output tape. For every $ f: \{0,1\}^* \rightarrow \{0,1\}$ and time-constructible $ T: \mathbb{N} \rightarrow \mathbb{N}$ , if $ f$ is computable in time $ T(n)$ by a TM $ M$ using $ k$ tapes, then it is computable in time $ 5kT(n)^2$ by a single-tape TM $ \hat{M}$ .

The proof roughly goes like this, from my understanding:

-> We encode the $ k$ tapes of $ M$ on a single tape, using locations $ i,k+i,2k+i$ for the $ k$ -th tape.

-> For each character $ a$ in M’s alphabet, we define another character $ \hat{a}$ in $ \hat{M}$ ‘s alphabet. For each of M’s tapes, the $ \hat{a}$ character indicates the current position of that respective tape (for $ M$ ) in $ M’s$ encoding.

-> For each state transition of $ M$ , $ \hat{M}$ then perform two sweeps: One left-to-right, where $ \hat{M}$ finds the positions of $ M$ at the respective working tapes and one right-to-left where $ \hat{M}$ updates its tape encoding according to state transition function of $ M$ .

Its not clear to me how the first step exactly works. Arora and Barak write: “First it sweeps the tape in the left-to-right direction and records to its register the $ k$ symbols that are marked by $ \hat{a}$ “. As far as I understand, registers correspond to states in the TM $ M’$ . What is exactly is meant by recording the symbols to its register?