Simulating multi-tape turing machines by single-tape TM

In “Computational Complexity: A modern approach”, Arora and Barak proof the following Claim:

Define a single-tape Turing machine to be a TM that has only one read-write tape, that is used a input, work, and output tape. For every $ f: \{0,1\}^* \rightarrow \{0,1\}$ and time-constructible $ T: \mathbb{N} \rightarrow \mathbb{N}$ , if $ f$ is computable in time $ T(n)$ by a TM $ M$ using $ k$ tapes, then it is computable in time $ 5kT(n)^2$ by a single-tape TM $ \hat{M}$ .

The proof roughly goes like this, from my understanding:

-> We encode the $ k$ tapes of $ M$ on a single tape, using locations $ i,k+i,2k+i$ for the $ k$ -th tape.

-> For each character $ a$ in M’s alphabet, we define another character $ \hat{a}$ in $ \hat{M}$ ‘s alphabet. For each of M’s tapes, the $ \hat{a}$ character indicates the current position of that respective tape (for $ M$ ) in $ M’s$ encoding.

-> For each state transition of $ M$ , $ \hat{M}$ then perform two sweeps: One left-to-right, where $ \hat{M}$ finds the positions of $ M$ at the respective working tapes and one right-to-left where $ \hat{M}$ updates its tape encoding according to state transition function of $ M$ .

Its not clear to me how the first step exactly works. Arora and Barak write: “First it sweeps the tape in the left-to-right direction and records to its register the $ k$ symbols that are marked by $ \hat{a}$ “. As far as I understand, registers correspond to states in the TM $ M’$ . What is exactly is meant by recording the symbols to its register?