Transforming multi-tape Turing machines to equivalent single-tape Turing machines

We know that multi-tape Turing machines have the same computational power as single-tape ones. So every $ k$ -tape Turing machine has an equivalent single-tape Turing machine.

About the computability and complexity analysis of such a transformation:

Is there a computable function that receives as input an arbitrary multi-tape Turing machine and returns an equivalent single-tape Turing machine in polynomial time and polynomial space?

Logic behind a single-tape NTM solving the TSP in $O({n}^4)$ time at most

I was going through the classic text “Introduction to Automata Theory, Languages, and Computation” by Hofcroft, Ullman, Motwani where I came across a claim that a “single-tape NTM can solving the TSP in $ O({n}^4)$ time at most” where $ n$ is the length of the input given to the turing machine (an instance of the TSP). The authors have assumed the encoding scheme as follows:

The TSP’s problem could be couched as: “Given this graph $ G$ and limit $ W$ , does $ G$ have a hamiltonian circuit of weight $ W$ or less?”

graph

Let us consider a possible code for the graphs and weight limits that could be the input. The code has five symbols, $ 0$ , $ 1$ , the left and right parentheses, and the comma.

  1. Assign integers $ 1$ through $ m$ to the nodes.

  2. Begin the code with the value of $ m$ in binary and the weight limit $ W$ in binary, separated by a comma.

  3. If there is an edge between nodes $ i$ and $ j$ with weight $ w$ , place $ (i, j, w)$ in the code. The integers $ i$ , $ j$ , and $ w$ are coded in binary. The order of $ i$ and $ j$ within an edge, and the order of the edges within the code are immaterial. Thus, one of the possible codes for the graph of Fig. with limit $ W = 40$ is

$ 100, 101000(1, 10, 1111)(1, 11, 1010)(10, 11, 1100)(10, 100, 10100)(11, 100, 10010)$

The authors move to the claim as follows:

It appears that all ways to solve the TSP involve trying essentially all cycles and computing their total weight. By being clever, we can eliminate some obviously bad choices. But it seems that no matter what we do, we must examine an exponential number of cycles before we can conclude that there is none with the desired weight limit $ W$ , or to find one if we are unlucky in the order in which we consider the cycles.

On the other hand, if we had a nondeterministic computer, we could guess a permutation of the nodes, and compute the total weight for the cycle of nodes in that order. If there were a real computer that was nondeterministic, no branch would use more than $ O(n)$ steps if the input was of length $ n$ . On a multitape $ NTM$ , we can guess a permutation in $ O({n}^2)$ steps and check its total weight in a similar amount of time. Thus, a single-tape $ NTM$ can solve the TSP in $ O({n}^4)$ time at most.

I cannot understand the part of the above paragraph in bold. Let me focus on the points of my problem.

  1. What do they mean that the branch would take no more than $ O(n)$ ?
  2. On a multitape $ NTM$ , how can we guess a permutation in $ O({n}^2)$ steps ?
  3. How are we getting the final $ O({n}^4)$ ?

I neither get the logic nor the computation of the complexity as very little is written in the text.

Equivalence from multi-tape to single-tape implies limited write space?

Suppose I have the following subroutine, to a more complex program, that uses spaces to the right of the tape:

$ A$ : “adds a $ at the beginning of the tape.”

So we have: $ $ \begin{array}{lc} \text{Before }: & 0101\sqcup\sqcup\sqcup\ldots\ \text{After }A: & $ 0101\sqcup\sqcup\ldots \end{array} $ $

And it is running on a multi-tape turing machine. The equivalence theorem from Sipser book proposes we can describe any multi-tape TM with a single-tape applying the following “algorithm”, append a $ \#$ at every tape and then concat every tape in a single tape, and then put a dot to simulate the header of the previous machine, etc, etc.

With $ a$ and $ b$ being the content of other tapes, we have: $ $ a\#\dot{0}101\#b $ $ If we want to apply $ A$ or another subroutine that uses more space, the “algorithm” described in Sipser is not enough, I can intuitively shift $ \#b$ to the right, but how to describe it more formally for any other subroutine that uses more space in the tape? I can’t figure out a more general “algorithm” to apply in this cases.

Simulating multi-tape turing machines by single-tape TM

In “Computational Complexity: A modern approach”, Arora and Barak proof the following Claim:

Define a single-tape Turing machine to be a TM that has only one read-write tape, that is used a input, work, and output tape. For every $ f: \{0,1\}^* \rightarrow \{0,1\}$ and time-constructible $ T: \mathbb{N} \rightarrow \mathbb{N}$ , if $ f$ is computable in time $ T(n)$ by a TM $ M$ using $ k$ tapes, then it is computable in time $ 5kT(n)^2$ by a single-tape TM $ \hat{M}$ .

The proof roughly goes like this, from my understanding:

-> We encode the $ k$ tapes of $ M$ on a single tape, using locations $ i,k+i,2k+i$ for the $ k$ -th tape.

-> For each character $ a$ in M’s alphabet, we define another character $ \hat{a}$ in $ \hat{M}$ ‘s alphabet. For each of M’s tapes, the $ \hat{a}$ character indicates the current position of that respective tape (for $ M$ ) in $ M’s$ encoding.

-> For each state transition of $ M$ , $ \hat{M}$ then perform two sweeps: One left-to-right, where $ \hat{M}$ finds the positions of $ M$ at the respective working tapes and one right-to-left where $ \hat{M}$ updates its tape encoding according to state transition function of $ M$ .

Its not clear to me how the first step exactly works. Arora and Barak write: “First it sweeps the tape in the left-to-right direction and records to its register the $ k$ symbols that are marked by $ \hat{a}$ “. As far as I understand, registers correspond to states in the TM $ M’$ . What is exactly is meant by recording the symbols to its register?