Buchi automata in formal software verification

As I am studying the application of Buchi automata in formal software verification, I am interested in the computational complexity (or links to papers) for the algorithms used to solve the problem in practice.

Let me recall the mathematical intuition behind model checking using these techniques. Buchi automata are the automata associated to $ \omega$ -regular languages. You can think of these languages as the infinite version of the regular languages (infinite in the sense that they operate on infinite words). As regular languages, they enjoy closure properties under composition and intersection. It is also well known that these Automata are “isomorphic” to proposition of a logic called S1S. This is the logic of Monadic Second Order Logic (i.e. you can have quantification of on the predicates of the logic, but the elements can ranges over a finite set). S1S has the nice property of being decidable, i.e. for each preposition we can tell if it is true or not. To find if a S1S formula is satisfiable, we can find if the associated atomata has an accepting state. In Buchi automata this is done by finding cycles between the starting state and any final state.

To find a “bug” in a software we proceed like this:

  • we formalize our software as $ \mathbb{S}$ using a S1S logic (a monaidic second order logic).
  • we formalize the security properties of our software in the logic $ \mathbb{P}$
  • now we use closeness properties of $ \omega$ -languages, whose associated automata are equivalent to S1S logic: we build Buchi automata of $ \overline{\mathbb{S}} \cap \mathbb{P}$

In this way, if the software does not comply to the security properties, the resulting automata will have at least an accepting state. In practice, the accepting state of a Buchi automata can be found by finding a cycle in the associated graph between the starting state and any final state.

I suspect in practice an algorithm for cycle detection cannot be used. This is due to the constraint that the cycle should pass from the only starting state and should pass through an accepting state, so the likelihood of finding the cycle you need might be very low. Anyway, the best algorithm for Cycle detection is solved by the Johnson Algorithm in time $ O((n + e)(c + 1))$ and space bounded by $ O(n + e)$ ).

In practice, S1S decidibility using Buchi automata might be best to be solved with s-t connectivity. For instance, NetworkX solves s-t connectivity with this algorithm by Abdol-Hossein Esfahanian.

Am I correct?

Which graph algorithm is used in practice to solve S1S satisfiability using Buchi automata?

I have found this, but it just shows a better reduction from S1S to Buchi, and it doesn’t talk about graphs.

Can a Formal Language be of a type (RE, REC, Regular, etc) for one TM, but of a different type for another?

I’m new to the study of formal languages, and I wondered if languages of a certain type are objectively of that type (RE, REC, regular, etc), or if their type varies on their context? I had this thought:

If I had a language A that is recursive for some Turing Machine, TM(A), would it be possible to design another TM, TM(A2), where you add in a feature that instead makes TM(A2) loop for one or more words in A?

Now, A is no longer recursive.

This seems to be intuitively obvious for at least some examples (though my example is with a very simple language). Say A = {1}, and TM(A) is a program that just says:

‘if (x=1) {return true}’

For TM(A), A is accepted.

Now, let TM(A2) be a program that says ‘while (x>_1) {x+=1}’, and so now A causes TM(A2) to loop forever.

Granted that this example may unbenkownst to me be too simple to prove a real point, but if the above holds, then yes, languages can be of ‘different types’ depending on their context.

However, in none of the literature I’ve seen has this been written to be the case? If a language is recursively enumerable in one given example, it’s recursively enumerable altogether – that seems to be the description of how it works.

To combat this confusion then, and given also that a language type in the Chomsky hierarchy is a subset of the types above it in the hierarchy, would this be a correct definition of the ‘type of a language’: if there exists at least one machine (a DFA, a PDA, TM, etc) of a given language type, type X, that can be found to accept on a language, and it can be proven that no machine from the type below in the Chomsky hierarchy can accept on that language, then that language is of ‘type X’, and all the types above it?

If I’m correct in that rather long definition, then, it means that when a textbook says for example that a language is context free, then it means yes there are an infinite number of PDAs or TM’s the can be thought of for which the CF language doest not accept, or loops even. But, there exists at least one PDA for which the language accepts on, and there can be no DFA that can accept on the language?

Further, when a textbook says a language is not even recursively enumerable i.e. unsolveable, it means that this language is above recursively enumerable in the Chomsky hierarchy, and that truly it is impossible to solve – that no TM can be designed to accept/reject on it?

If I’ve missed the point entirely or have been unclear, please do let me know.

Formal definition of an empty stack accepting PDA

PDA’s are usually defined using the 7-tuple convention.

$ M=(Q, \Sigma, \Gamma, \delta, q_{0}, Z, F)$

F is the set of accepting states.

I want to design a PDA accepting by empty stack, so using this notation makes no sense, as I don’t need F and I want to make the acceptance condition clear.

How is this usually done? Can I just dismiss F?

$ M’=(Q, \Sigma, \Gamma, \delta, q_{0}, Z)$

Are there any formal grammars describing the set of all directed graphs?

Let GRAPHS be the set of all directed graphs.

Is there a set of strings STRYNGS such that there exists a bijection between GRAPHS and STRYNGS?

The canonical definition of directed graph comes to mind:
An ordered pair of two sets, (V, E) such that V is a set of vertices and E is a set of ordered pairs of elements in V.

Perhaps the following string is in the set of strings:

(  {0, 1, 2} ,    {(0, 1), (1, 0), (1, 2), (2, 0)}   ) 

Maybe we can use Zermelo’s definition of natural number:

1 = { }      2 = {{ }}       3 = {{{ }}}  and so on... 

The following two graphs are isomorphic:

(  {0, 1, 2} ,    {(0, 1), (1, 0), (1, 2), (2, 0)}   ) (  {9, 5, 3} ,    {(9, 5), (5, 9), (5, 3), (3, 9)}   ) 

STRYNGS need not contain all graphs per se, but merely at least one representative from each equivalence class defined by the graph isomorphism relation.

If there there exists such set of strings STRYNGS, and there exists a bijection between the strings and the set of digraphs GRAPHS, does there also exist a grammar describing STRYNGS?

The grammar can be Context-sensitive, context-free or otherwise; any kind of grammar will do.

Printing (number, uncertainty) pair in a formal way

I am trying to do an auxilliary function to print pairs of value-uncertainty in a formal way, given their name, value and uncertainty.

The “technical” agreement is that “uncertainty rules”, meaning that the significant digits of a pair to be printed are given by the decimal size of the uncertainty value. For example: (123.456, 0.012), (123450, 120)

Additional rules:

  • If an uncertainty is greater than 10^3, then exponent notation will be used with 1 decimal digit for the value (ideally, the pair should share the exponent factor which would be given by the value exponent factor, but I don’t know how to do that yet – I marked it as to-do in the code). For example, (6.31e+03, 0.31e+03) would be the ideal, now I am getting (6.31e+03, 3.1e+02).
  • Don’t print decimal point if the uncertainty value is greater than 10.

Some options I try to give:

Whitespace formatting:

'{:{ws}{nfmt}}'.format(ws = ' ' if ws in [True, ' '] else '', nfmt = ...) # nfmt stands for number formatting, e.g. .1f, etc. see code. 

Representation formatting: useful for latex, unicode printing, simple/ascii or shorthand notation (I don’t know how to implement exponent notation in short notation representation, that’s why I refer to making a custom repr method in the code).

  • Simple/ascii: 0.210+/-0.011
  • Short: 0.210(11) # uncertainty takes integer form. I don't know how to do this one
  • Short-composite: 0.210(0.011)
  • Fancy/utf-8: 0.210 ± 0.011
  • Latex: $ 0.210 \pm 0.011$

Here is the full code:

def _print_fres(names, vals, uncs, sigds = 2, rfmt = 'pm', ws = False): # sigds are the significance digits     try:         if all([str(u).lower() not in 'inf' for u in uncs]):                 sigs = [                     (re.search('[1-9]', str(u)).start()-2 \                         if re.match('0\.', str(u)) \                     else -re.search('\.', str(float(u))).start())+sigds \                     for u in uncs                     ]                 # significant digits rule in uncertainty         else:             print('Warning: infinity in uncertainty values')             sigs = [sigds] * len(uncs)     except TypeError: #NaN or None         raise TypeError('Error: odd uncertainty values')      rfmt = rfmt.lower()     # this can be done better/prettier I think     if rfmt in ['fancy', 'pms']: # pms stands for pmsign         res_str = '{{0}} = {{1:{ws}{nfmt}}} ± {{2:{ws}{nfmt}}}'     elif rfmt in ['basic', 'pm', 'ascii']:         res_str = '{{0}} = {{1:{ws}{nfmt}}}+/-{{2:{ws}{nfmt}}}'     elif rfmt in ['tex', 'latex']:         res_str = '$  {{0}} = {{1:{ws}{nfmt}}} \pm {{2:{ws}{nfmt}}}$  '     elif rfmt in ['s1', 'short1']:         res_str = '{{0}} = {{1:{ws}{nfmt}}} ± {{2:{ws}{nfmt}}}'         # not yet supported. to do: shorthand notation     elif rfmt in ['s2', 'short2']:         res_str = '{{0}} = {{1:{ws}{nfmt}}}({{2:{ws}{nfmt}}})'     else:         raise KeyError('rfmt value is invalid')      for i in range(len(vals)):         try:             print((res_str.format(                     nfmt = '1e' if uncs[i] >= 1000 \                     # 1 decimal exponent notation                         else (                             'd' if sigs[i] <= 0 \                             # integer if uncertainty >= 10                             else '.{}f'.format(sigs[i])),                     ws = ' ' if ws in [True, ' '] else ''                     )                  ).format(                     names[i],                     round(vals[i], sigs[i]),                     round(uncs[i], sigs[i])                     # round to allow non-decimal significances                  )              )          except (TypeError, ValueError, OverflowError) as e:             print('{} value is invalid'.format(uncs[i]))             print(e)             continue     # to do: a repr method to get numbers well represented     # instead of this whole mess 

Roadmap to formal verification

I would like to learn about different approaches to formal verification of software programs that goes beyond what Wikipedia has to offer. Ideally one would not only get an overview but also recommendations which books and articles to read next.

My goal is to be able to judge what parts of formal verification I could apply to my job as a software/network engineer. So if the overview had a practical slant it would be a plus. But I also don’t mind theory at all since I have a rather solid background in pure maths.