Is every language described by a grammar?

I read the following argument showing that not every language is described by a grammar:

For a fixed alphabet $ \Sigma$ and variables $ V$ there are uncountable many languages over $ \Sigma$ since the power set of $ \Sigma^\ast$ is uncountable. But grammars are finite objects by construction and thus there are only countably many grammars. In total, there can only be countably many languages described by grammars. Hence, there are this uncountably many languages that cannot be described by grammars.

I understand the idea of the argument, however i am not convinced by how they show that there are only countably many grammars. What do they mean by "finite objects"? Couldn’t one just take the set $ \{ (V,\Sigma, P, S) | \; P\subseteq (V\cup \Sigma)^+ \times (V\cup \Sigma)^\ast,\; S\in V\}$ , which is clearly uncountable, to get uncountably many grammars? Or do the languages that they generate fall together so often that we only get countably many languages generated by grammars in the end?

Is there any official terminology about something like double quotes “” grammar?

In many programming language string is a token.

For example:

 token               ::= '"' string                        | nat   string              ::= string                        | '"'   nat                 ::= digit nat                        | ϵ 

This is a LL(1) grammar for some programming language’s toke grammar.

When parsing a string, there is no need to check follow set, because there is a " at the end of each string.

Comparing with nat, string is more easy to parse.

My question is

Is there any official terminology about this kind of grammar?


Removing left factoring from Context-Free Grammar

I know that, removing left factoring is a simple task.
And i understand following procedure:

$ S→aA | aB$
$ S→aS’$
$ S’→A|B$

Yet I’m running into problems with this particular grammar:

$ S→AD|bbS|bScS|BS $
$ A→aAbb | abb$
$ B→aB|ba|b$
$ D→cDd|cccd$

How to remove left factoring from it, I’m trying to convert it into LL(1) grammar

Grammar and Enumerator for Decision and Halting Problem

in theoretical computer science I learned for every recursive enumerable language there would be an enumerator and a grammar. So since decision problem and halting problem are recursively enumerable, I was wondering what kind of grammar and enumerator could this be.

Ok since there exists a sequence of M_i I would start with M_1 and find all words for this TM and give them out. So if I have any TM is there a possibility to give all words out which are accepted by this TM? I probably would have to give all w_i to it and compute the first i words for i steps, then i+1 words for i+1 steps and so on. Or maybe something like DFS on all configurations. This really sounds like that only for one TM this could go on forever. So I would need to start the second TM for the same period of time after a while… Seems as if something similiar could work for Halting Problem. Do you have any more refined thoughts on this one?

But the Problem with the Grammar seems to be more challenging. How could I possibly come up with a Grammar for these problems? You would have to generate code(M_i) in such a way that its always in the language. So you would have to simulate the TM through grammmar on different words. It somewhat comes down to the question whether a TM accepts anything or holds on any entry. Or is it more "okay we proofed, so there must be some kind of grammar even though I can´t comeup with an idea for it".



Is EBNF a formal grammar? If yes, how can we generate production rules from EBNF expression?

According to Wikipedia definition EBNF, EBNF is a formal grammar.

My question is that how could I generate production rules base on EBNF expression:

For example:


letter = "A" | "B" | "C" ;

digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

identifier = letter , { letter | digit | "_" } ;

Generates production rules:

letter ⟶ "A"

letter ⟶ "B"

letter ⟶ "C"

digit ⟶ "0"

digit ⟶ "1"

digit ⟶ "2"

digit ⟶ "3"

digit ⟶ "4"

digit ⟶ "5"

digit ⟶ "6"

digit ⟶ "7"

digit ⟶ "8"

digit ⟶ "9"

identifier ⟶ letter

identifier ⟶ letter noname_nonterminal

noname_nonterminal ⟶ letter

noname_nonterminal ⟶ digit

noname_nonterminal ⟶ "_"

Thank you for your reading,

What is a make sense (meaningful) example of language that an unrestricted grammar could generate?

I have learned that:

  1. Unrestricted grammar is used to define (or describe) a formal language.
  2. Unrestricted grammar is used to define recursively enumerable set [][1].

I’d like to find a meaningful example for #1 case which is similar to below context sensitives grammar example to learn the purpose of unrestricted grammar. I could find a meaningful example for context sensitives grammar but I could not find a one for the unrestricted grammar yet. Could you help me?

Language for “Network racing game record” with below record instances:

Mr. Bean Male Player 1

Ms. Emma Female Player 2

Mr. Hải n/a Computer 3

Ms. Tú n/a Computer 4

Production rule:

S ⟶ Title Name TAB Sex UserType TAB Rank

Title WomanName ⟶ "Ms. " WomanName

Title ManName ⟶ "Mr. " ManName

WomanName TAB Sex "Player" ⟶ WomanName TAB "Female" "Player"

ManName TAB Sex "Player" ⟶ ManName TAB "Male" "Player"

Name TAB Sex "Computer" ⟶ Name TAB "n/a" "Computer"

Name ⟶ WomanName

Name ⟶ ManName

Sex ⟶ "Male"

Sex ⟶ "Female"

UserType ⟶ "Player"

UserType ⟶ "Computer"

Rank ⟶ "1"

Rank ⟶ "2"

Rank ⟶ "3"

Rank ⟶ "4"

WomanName ⟶ "Emma"

WomanName ⟶ "Tú"

ManName ⟶ "Bean"

ManName ⟶ "Hải"

TAB ⟶ "\t"

Help with context free grammar excercise

So, I have an exercise in which I have to write a context free grammar for this language:

$ $ L = \{x \in L(a^∗b^∗c^∗) : |x|_a > |x|_c; |x|_b > 0; |x|_c ≥ 0\}$ $

meaning every string with any number of $ a$ ‘s, $ b$ ‘s and $ c$ ‘s in that order, with the amount of $ a$ ‘s greater than the amount of $ c$ ‘s and the amount of $ b$ ‘s greater than zero.

I am having trouble figuring out the rule that makes sure there are more $ a$ s than $ c$ s.

I have: $ $ \begin{align}S&\to aABC | ab\ A&\to aA | a\ B&\to bB | b\ C&\to cC | c\ \end{align}$ $ I know this is wrong because I should be adding an $ a$ every time I add a $ c$ , but I don’t know how to write that.

Is there any good method to find if a grammar is optimal for a problem?

I’ve been thinking about grammatical evolution problems and how the grammar influences the algorithm performance. It came to my mind the huge impact that the grammar that you’re using has in the time that takes an algorithm to reach an optimum solution.

The simplest example would be if your problem doesn’t involve trigonometric operations. If you’re trying to find f(x) = 3x - 1/2, including sins, tangents or square roots in your grammar will, almost certainly, slowen your algorithm as the population complexity will grow. Other not-so-evident simplifications for a grammar would be trigonometric identities:

tan(x) = sen(x) / cos(x) 

Talking about this last example, I don’t know how to determine the importance of the impact of including tan(x) between the grammar rules to produce valid solutions. Or in other words, knowing if adding tan(x) will be better in terms of performance than don’t doing it and thus, forcing the evolution to combine two or more operators and terminals to being able to use that operation and making the grammar ambiguous.

So this two are the questions:

  1. Is there any way of knowing if a grammar is optimal for finding a solution?
  2. Which evolutionary algorithm or machine learning method (considering that I’m almost profane in this discipline, some explanation is wellcome) would you use for finding optimal or sub-optimal grammars?


Why is the following grammar not LL(1)

Consider the following grammar:

S → bAb   | bBa A → aS   | CB B → b   | Bc C → c   | cC 

I have to provide the reasons as to why this grammar is not LL(1). So far all I can think of is that the grammar is not left factored given the productions:

S → bAb   | bBa 

But I am also wondering if the grammar is left recursive due to the productions:

B → b   | Bc 

Options provided are:

  • This grammar has left recursion. (Unsure)
  • This grammar has right recursion. (Would not make grammar not LL(1))
  • This grammar is ambiguous. (Unsure)
  • This grammar is not left factored. (Correct)
  • This grammar can produce infinitely many distinct strings. (This shouldn’t affect a grammar right?)

As far as I can tell, the grammar is not ambiguous, I have tried 3 different inputs and all have resulted in a single parse tree. So is this grammar not LL(1) just because of the lack of left factoring? or also because the grammar is left recursive?