So I have a definition of a sort of grammar for a programming language. Some aspects are recursive (like nesting function definitions), other parts of it are just simple non-recursive trees.
Originally I just treat this sort of like a Parsing Expression Grammar (PEG), and parse it using recursive descent. But this is kind of inefficient, as it sometimes has to build up objects as it’s parsing, only to find 10 calls down the road that it isn’t a good match. So then it backs up and has to try the next pattern.
I’m wondering what the general solution to this is. What can be done to create the most optimal algorithm starting from nothing but a tree-like data structure encoding the grammar?
Can you process the tree-like, recursive-ish BNF PEG data structure in a while loop, rather than using recursive descent? Better still, can you convert the data structure into a sort of automaton which you can (in some simple way) transition through, to parse the string without having to go down a bunch of wrong paths and generate stuff you only have to soon throw away if you find it’s a mismatch? Or if not, is there anything that can be done here with automata?
Sorry for my terminology (BNF vs. PEG), I am not using it exactly. I just mean I have a grammar, which is context-sensitive which falls outside the scope of either of these. So I am picking one or the other to simplify this question for the purpose of this site, even though I’m not 100% sure what the difference between BNF and PEG are at this point.