Build PDA for a language with unknown input alphabet

$ L_1 ,L_2$ are regular language. We form a new language $ L_{12}$ as follows: $ L_{12}=\left \{ w_1\cdot w_2|w_1\in L_1\wedge w_2\in L_2\wedge|w_1|=|w_2| \right \}$

In this exersice I am not given any alphabet and I’m required to build PDA for $ L_{12}$ , but by definition $ M=\left \{Q,\sum,\Gamma ,\delta ,q_0,-|,F\right\}$ and I don’t have any alphabet to work with.By intuition if the alphabet is similiar can effect the solution than if it wasn’t similiar.

How do you convert bits into a different alphabet?

I have forgotten how to do this. How do I figure out what the requirements are for a 128-bit string using a certain alphabet?

That is to say, I want to generate a UUID (128-bit) value, using only the 10 numbers for the alphabet. How many numbers do I need, and what is the general equation so I can figure this out for any alphabet of any size?

What is the equation for any n-bit value with any x-letter alphabet?

The way I do it is to guess and slowly iterate until I arrive at a close number. For powers of 10 it’s easy:

Math.pow(2, 128) 3.402823669209385e+38 Math.pow(10, 39) 1e+39 

For other numbers, it takes a little more guessing. Would love to know the equation for this.

Over the alphabet {a,b,c,d}, how would i construct a NFA that only accepts strings that end with a letter that is already part of the string?

I’ve been trying to create a NFA that accepts strings that end with a letter that exists in the string. For example abcdb, cbdd, acac etc. while strings like abc aacd etc are not accepted since the last letter wasnt in the string before the last letter was read. I only seem to be able to create a NFA that accepts a subset of the language. What is the right way to go about it? I’m very lost.

Are DFAs with a unary alphabet strictly less powerful than DFAs with a binary alphabet?

Are DFAs with a unary alphabet strictly less powerful than DFAs with a binary alphabet? Is this even a meaningful question?

For example, if $ \Sigma = \{\texttt{0}, \texttt{1}\}$ , we can encode any larger alphabet using $ \Sigma$ , but if $ \Sigma = \{\texttt{0}\}$ , this can define a DFA (that say, recognizes $ L = \{ \texttt{0}^k \mid k > 0\}$ )… but such a DFA would never be able to recognize more “complex” regular expressions. For example, there is no way to encode $ \texttt{0011}$ using a unary alphabet that a DFA would recognize (we could use, say, Godel numbering, but that would require a more powerful machine that could “count”).

If DFAs with a unary alphabet less powerful than DFAs with a binary alphabet, is there a name for this language/grammar? I recognize this is kind of an odd question, since the DFA that recognizes $ L = \{ \texttt{0}^k \mid k > 0\}$ recognizes all unary languages… but technically there still are a countably infinite number of DFAs in this class ($ L = \{ \texttt{0}^1 \}$ , $ L=\{\texttt{0}^2\}$ , etc.)

Note I am of course assuming that for $ \Sigma = \{ \texttt{0} \}$ , that it does not contain the empty symbol $ \varepsilon$ .

Regular expressions for set of all strings on alphabet $\{a, b\}$

I came across following regular expressions which equals $ (a+b)^*$ (set of all strings on alphabet $ \{a, b\}$ ):

  • $ (a^*+bb^*)^*$
  • $ (a^*b+b^*a)^*$
  • $ (a^*bb^*+b^*ab^*)^*(a^*b+b^*a)^*b^*a^*$

I want to generalise different ways in which we can append to original regular expression $ (a+b)^*$ , to not to change its meaning and still get set of all strings on alphabet $ \{a, b\}$ . I think we can do this in two ways :

  • P1: We can concatenate anything to $ a$ and $ b$ inside brackets of $ (a+b)^*$
  • P2: We can concatenate $ (a+b)^*$ with any regular expression which has star at most outer level ($ (…)^*$ )

  • P3: I know $ (a+b)^* = (a^*+b)^* = (a+b^*)^*= (a^*+b^*)^*$ . So I guess P1 and P2 also applies to them.

Am I correct with P’s?

Q. Also I know $ (a+b)^*=(a^*b^*)^*=b^*(a^*b)^*=(ab^*)^*a^*$ . Can we append some pattern of regular expressions to these also to not to change their original meaning?

What is the density of a regular language $L$ over an alphabet $\Sigma$ in $\Sigma^n$?

In other words, what is the likelihood that a recognizer of a given regular language will accept a random string of length $ n$ ?


If there is only a single non-terminal $ A$ , then there are only two kinds of rules:

  1. Intermediate rules of the form $ S \to \sigma S $ .
  2. Terminating rules of the form $ S \to \sigma $ .

Such a grammar can then be rewritten in shorthand with exactly two rules, thusly:

$ $ \left\{\begin{align} &S \enspace \to \enspace \{\sigma, \tau, \dots\} S = ΤS\ &S \enspace \to \enspace \{\sigma, \tau, \dots\} = Τ’\ \end{align}\right.\ \space \ (Τ, Τ’ \subset \Sigma) $ $

So, we simply choose one of the $ Τ$ (this is Tau) symbols at every position, except for the last one, which we choose from $ Τ’$ .

$ $ d = \frac {\lvert Τ\rvert^{n – 1} \lvert Τ’ \rvert} {\lvert\Sigma\rvert^n} $ $

I will call an instance of such language $ L_1$ .


If there are two non-terminals, the palette widens:

  1. Looping rules of the form $ S \to \sigma S $ .
  2. Alternating rules of the form $ S \to \sigma A $ .
  3. Terminating rules of the form $ S \to \sigma $ .
  4. Looping rules of the form $ A \to \sigma A $ .
  5. Alternating rules of the form $ A \to \sigma S $ .
  6. Terminating rules of the form $ A \to \sigma $ .

In shorthand: $ $ \left\{\begin{align} &S \enspace \to \enspace Τ_{SS} S \ &S \enspace \to \enspace Τ_{SA} A \ &S \enspace \to \enspace Τ_{S\epsilon} \ &A \enspace \to \enspace Τ_{AA} A \ &A \enspace \to \enspace Τ_{AS} S \ &A \enspace \to \enspace Τ_{S\epsilon} \ \end{align}\right.\ \space \ (Τ_{SS}, Τ_{SA}, Τ_{S\epsilon}, Τ_{AA}, Τ_{AS}, Τ_{S\epsilon} \subset \Sigma) $ $

Happily, we may deconstruct this complicated language into words of the simpler languages $ L_1$ by taking only a looping rule and either an alternating or a terminating shorthand rule. This gives us four languages that I will intuitively denote $ L_{1S}, L_{1S\epsilon}, L_{1A}, L_{1A\epsilon}$ . I will also say $ L^n$ meaning all the sentences of $ L$ that are $ n$ symbols long.

So, the sentences of this present language (let us call it $ L_2$ ) consist of $ k$ alternating words of $ L_{1S}$ and $ L_{1A}$ of lengths $ m_1 \dots m_k, \sum_{i = 1 \dots k}m_i = n$ , starting with $ L_{1S}^{m_1}$ and ending on either $ L_{1S\epsilon}^{m_k}$ if $ k$ is odd or otherwise on $ L_{1A\epsilon}^{m_k}$ .

To compute the number of such sentences, we may start with the set $ \{P\}$ of integer partitions of $ n$ , then from each partition $ P = \langle m_1\dots m_k \rangle$ compute the following numbers:

  1. The number $ p$ of distinct permutations $ \left(^k_Q\right)$ of the constituent words, where $ Q = \langle q_1\dots\ \rangle$ is the number of times each integer is seen in $ P$ . For instance, for $ n = 5$ and $ P = \langle 2, 2, 1 \rangle$ , $ Q = \langle 1, 2 \rangle$ and $ p = \frac{3!}{2! \times 1!} = 3$

  2. The product $ r$ of the number of words of lengths $ m_i \in P$ , given that the first word comes from $ L_{1S}$ , the second from $ L_{1A}$ , and so on (and accounting for the last word being of a slightly different form):

    $ $ r = \prod_{i = 1, 3\dots k – 1}\lvert L_{1S}^{m_i} \rvert \times \prod_{i = 2, 4\dots k – 1}\lvert L_{1A}^{m_i} \rvert \times \begin{cases} & \lvert L_{1S\epsilon}^{m_k} \rvert &\text{if $ m$ is odd}\ & \lvert L_{1A\epsilon}^{m_k} \rvert &\text{if $ m$ is even}\ \end{cases} $ $

If my thinking is right, the sum of $ p \times r$ over the partitions of $ n$ is the number of sentences of $ L_2$ of length $ n$ , but this is a bit difficult for me.


My questions:

  • Is this the right way of thinking?
  • Can it be carried onwards to regular grammars of any complexity?
  • Is there a simpler way?
  • Is there prior art on this topic?

Example of channel where capacity is achieved without a uniform distribution on the output alphabet

The capacity of a discrete memoryless channel is given by the maximum of the mutual information over all possible input probability distributions. That is

\begin{align} C &= \max_{p_X} I(X:Y) \ &= \max_{p_X} H(Y) – H(Y|X) \end{align}

$ H(Y|X)$ is specified by the channel only and has nothing to do with $ p_X$ . Hence, it seems that maximizing the capacity is just equivalent to maximizing $ H(Y)$ i.e. we want to choose $ p_X$ that guarantees a uniform distribution over the output alphabet.

Incorrect claim: If we achieve the capacity of the channel, it must be the case that the distribution over the output alphabet is a uniform distribution.

In this lecture, a remark is made at 15:00. The lecturer remarks that this is not true for all channels. There exist channels where the capacity is achieved without even using the full output alphabet. Can anyone give an example of this and also some general intuition on when the claim becomes false?