How can the VC-dimension of Turing machine be finite?

The VC-dimension of a hypothesis class $ \mathcal{H}$ is defined to be the size of the maximal set $ C$ such that $ \mathcal{H}$ cannot shutter. This paper shows that the VC-dimension of the set of all Turing Machines with $ n$ states is $ \Theta(n \log n)$ .

However, suppose that we take the set of all such Turing machines, for $ n$ sufficiently large so that the universal Turing machine is a member of $ \mathcal{H}$ . The result states that there exists a set $ C$ (wlog, $ C \subset \{0,1\}^*$ ) of size, say, $ n^2$ , such that $ \mathcal{H}$ cannot shatter. To my understanding, it means that there is no member of $ \mathcal{H}$ which can compute the function $ $ f(x) = \begin{cases} 1 \text{ if } x \in C(1) \ 0 \text{ else} \end{cases} $ $ Where $ C(1)$ are the points with label “1”.

But $ C$ is finite hence $ f$ is clearly computable, thus there is some Turing machine $ M_C$ which computes it, therefore $ M_C$ can be simulated by the universal Turing machine, which is in $ \mathcal{H}$ , and this is a contradiction. Where is the problem with this argument?

If the human brain is turing machine then how is it able to ascertain that certain problems are undecidable?

I recently read about the idea that the human brain might be a turing machine (or turing complete). If that is true then how is the brain able to reason that a certain problem is undecidable for e.g. the the liar paradox. I assume that a turing machine will nor be able to reason that the liar paradox statement is a logical paradox with no decidable answer and will be stuck forever.

Encoding any turing machine to one with 2 symbols

This is a repost from https://stackoverflow.com/questions/56990710/encoding-any-turing-machine-to-one-with-2-symbols since it was brought to my attention that the question would be better suited to this SE than SO.

I am interested in understanding how a Universal Turing Machine would work on a practical level.

I read Rogozhin’s paper Small universal Turing machines to understand how a UTM works and how to encode a 2-tag system to act as its input. Next I read Cocke and Minsky’s Universality of tag systems with p=2 on how to convert a 2-symbol TM to a 2-tag system.

My literature search is now stuck at the point where I need to convert an arbitrary TM to a 2-symbol TM. 2-symbol means that the TM has an alphabet of 0 and 1 with 0 also acting as a blank, so I strictly have only two symbols at my disposal.

This question https://stackoverflow.com/questions/4820488/can-a-turing-machine-be-constructed-having-only-two-tape-symbols has a well-written answer that gives the necessary pointers to understand the conversion generally, but it is not explicit enough to know that a possible solution would be a good one.

Can anyone direct me to a paper that shows the state-of-the-art conversion process? I am sure that Cocke and Minsky based their work on proof that any TM can be emulated by a 2-symbol TM.

If there is no literature reference to be had, could anyone help me analyse whether my solution is close enough to optimal? From the post above, it seems like I need to encode the original TM’s tape to a fixed-length binary representation. My solution looks like the following, and I would be happy if someone could comment on whether I am way off the mark:

Let’s assume that the original TM has 5 states and 8 symbols, so I can use a 3-bit encoding. After reading a word, we would have to encode writing and moving the head like this: 6 states are needed to get back to the beginning of the word that was just read (moving the head once without writing is the equivalent to read 0/write 0 and read 1+write 1, which takes 2 states – repeat 3 times to get to our goal position). Add another 3 states for writing the new word. 6 more states are needed to move back to the beginning of the word and another 6 states to move the head left or right.

The reading operation itself takes 8 states for a 3-bit word (ending up in one of 8 states depending on the result). From here the transition described above is started, so from a given state I can encode all transitions including the read operation in

read + possible_reads*(move+write+move+move) = 8 + 8*(6+3+6+6) = 176 states.

This encodes all transitions of a single state from the original TM. Since we have 5 original states in total, the complete new TM would need 5*176 = 880 states.

In general terms I would need s*(1+(7*b))*q states, where s is the alphabet size and q the number states of the original TM, and b is the encoding bit depth ceil(log(s)).

This sounds about fair to me, is there something I overlooked? I think writing intermediate state information on the tape is out of the question with only 2 symbols so encoding everything in states seemed the only reasonable option to me.

PS: I also need some pointers in how to get the 2-symbol TM to encode the monadic input to binary. The link above does not give much useful input to this question.

Edit: In order to show that I already tried to get an answer to my second question, here is my specific problem: I have to convert multiple monadic inputs to their binary counterparts using a 2-symbol TM. So the input separation between monadic words would be achieved with a single zero and the end of the tape with two zeros. Now when I start the conversion process, I will have to go back and forth between the monadic inputs and the binary outputs. The binary outputs may naturally contain multiple zeros which would prevent me from finding the end of the tape in the conversion process. I could switch to a different end-of-tape word that cannot occur in the binary numbers, but I am not sure that this is really the most efficient option.

Turing machine where the next step is determined by the state and the symbols up to the read/write head

Given a modified type of turning machine where the next step is determined as followed:

$ \delta = Q\times \Gamma^* \implies Q\times \Gamma \times \{L,R\}$

where the next step of the machine is determined by the current state and whatever written on the tape up to the current point.

For example: if the tape content is $ a\_ab\_$ $ a$ , and the head is on the left $ , then the next step is determined by the state and the word $ a\_ab\_$ $ .

  1. How can I prove that unrecognized languages can be recognized with these types of machines?
  2. Do those machines contradict the Church–Turing thesis?

Informative answer would be really appreciated as I’m finding it hard to understand this subject specifically. Thanks in advance

Turing Machine equivalence in MinTM proof

The proof with contradiction : MIN TM is not Turing-recognizable from Michael Sipser’s textbook (Theorem 6.7) is as follows :

C= “On input w” 1. Obtain, via the recursion theorem, own description 2. Run the enumerator E until a machine D appears with a longer description than that of C. 3. Simulate D on input w.

MINtm is infinite, so atleast one D occurence will be observed.

Because C is shorter than D and is equivalent to it, D cannot be minimal.

I would like to understand two things from this proof :

  1. How do we know that MIN TM is inifinite ?
  2. Why C is equivalent to D ?
  3. Why Because C is shorter than D and is equivalent to it , D cannot be minimal?. Why we expect D to be minimal ?

Edit : I have some opinions(I want to verify) about my questions. Please correct me If I am wrong.

  1. Equivalence is based only on ability to construct same language recognizers or to say L(M1) = L(M2)

Turing Machines proof notations

In context of “Computability”, I have went over some proofs for Recursion Theorem using Turing Machine description. A TM M stands for a single tape Turing machine and is the description of TM M. Even some semi-decidable machines such as Atm = { < M, w > | TM M accepts w}. My question is simple

What is difference between using < M > ,< w > vs < M , w > for a TM description ?

A simplistic answer I assume is that it is being a convention in writing formal proof that books follows standard of combining all inputs as a single description (all inputs comes under single brackets as opposed to multiple descriptions)?

Difference between multi-tape Turing machine and single tape machine

A beginner’s question about “fine-grained” computational power.
Let $ M_k$ be a $ k$ -tapes turing machine, and let $ M$ be a single tape turing machine. We know that $ M_k$ and $ M$ both have the same “computable power”. In addition, one can simulate $ M_k$ on $ M$ in a way that every computation which takes $ O(t(n))$ on $ M_k$ will take $ O(t(n) \log(t(n))$ on $ M$ .
Here is my question:
Is there a language $ L$ such that $ L$ can be decided in $ O(n)$ time in $ k$ -tape Turing machine (for fixed $ k$ , say 2), but can’t be decided in $ O(n)$ time in a single tape machine? (every single tape machine which decides $ L$ needs $ \Omega(n \log n)$ time).
In addition, are there any examples of two computational models (classical, not the quantum model) with the same computable power, but with fine-grained differences in their running time? (I guess that major changes in running time would contradict the extended Church-Turing thesis, which is less likely).