How to convert recursive language grammar tree into automaton for optimal parsing?

So I have a definition of a sort of grammar for a programming language. Some aspects are recursive (like nesting function definitions), other parts of it are just simple non-recursive trees.

Originally I just treat this sort of like a Parsing Expression Grammar (PEG), and parse it using recursive descent. But this is kind of inefficient, as it sometimes has to build up objects as it’s parsing, only to find 10 calls down the road that it isn’t a good match. So then it backs up and has to try the next pattern.

I’m wondering what the general solution to this is. What can be done to create the most optimal algorithm starting from nothing but a tree-like data structure encoding the grammar?

Can you process the tree-like, recursive-ish BNF PEG data structure in a while loop, rather than using recursive descent? Better still, can you convert the data structure into a sort of automaton which you can (in some simple way) transition through, to parse the string without having to go down a bunch of wrong paths and generate stuff you only have to soon throw away if you find it’s a mismatch? Or if not, is there anything that can be done here with automata?

Sorry for my terminology (BNF vs. PEG), I am not using it exactly. I just mean I have a grammar, which is context-sensitive which falls outside the scope of either of these. So I am picking one or the other to simplify this question for the purpose of this site, even though I’m not 100% sure what the difference between BNF and PEG are at this point.

Optimal way to split a array

Consider a array $ a$ with $ n$ elements .the goal is to divide the array into segments ,which are continuous,formally from $ 1$ to $ i$ ,then $ i+1$ to $ j$ and $ j+1$ to $ k$ and so on .Cost for making a new segment is $ x$ and also if a segment has a element that occurs $ i,i>1$ times then final cost increases by $ i$ .Find the minimum possible cost.

My idea is that at first i would need a segment so for $ a[0]$ i will make one and then i will proceed in array and add current element greedily and by greedily by adding $ a[i]$ the cost may not increase if it never occurred in current segment or if it has occurred then i will check if cost increased by this element and cost for a new segment ,if cost is minimum in adding this element to current segment i will do this or else i will make a new segment starting with $ a[i]$ . But my greedy algorithm is wrong ,can anyone help me why my algorithm is worng?

Optimal placement of radio towers

Imagine there is area on which you have to put radio towers. Each radio tower send signal in circlular area of fixed size.

How to I derermine optimal number and placements of these towers so the overlapping areas are as small as possible, whole area is in range of towers, and the number of towers is minimal?

What is the optimal algorithm for playing the hangman word game?

Suppose we are playing the game hangman. My opponent picks a word at random from the dictionary, which we both have access to during the game. Once my opponent has picked a word, I am given only the length of the word. I can guess a letter which I think will be in the word, and if it is in the word my opponent identifies all of the positions of that letter in that word. If I guess a letter which isn’t in the word, that’s counted as one mistake. If I can guess the word before too many mistakes I win.

My opponent’s goal should be to pick a word which maximizes the number of mistakes (incorrect guesses) I make before I can guess the word. My goal is to minimize them. (Traditionally, if the number of mistakes is > some number then I lose the game, but we can relax that constraint.)

Consider three algorithms for letter guessing. These are the main ones I’ve thought of, but there are probably more, and I welcome alternate ideas. In all three algorithms, the first step will be to gather the list of words which are the same length as the secret word. Then, for each letter in the alphabet, count the number of words in the dictionary which contain that letter. For example, maybe 5000 contain "a", 300 contain "b", and so on. Here it is in python:

    alphabet = list('abcdefghijklmnopqrstuvwxyz')      while not found:         probabilities = dict.fromkeys(alphabet, 0)          for word in dictionary:             for letter in word:                 if letter in alphabet:                     probabilities[letter] += 1          # now we have the letter frequencies  

After that is where the three algorithms diverge.

  1. In the first algorithm, we guess the letter which the most number of remaining words contain. So if 5000 remaining words contain "a" and no other letters are in that many words, we will pick "a" every time. If "a" is correct, this gives us positional information which we can filter the list further. For example, we might filter the list by all words that match ".a..". (Dots are unknown letters.) If "a" is incorrect, we filter out all words which contain "a". In the case where there is a tie and two letters are found in an equal number of words, letters are chosen alphabetically. So if we know the word matches ".ays", we’ll just guess words in alphabetical order.

  2. This is only slightly different from the first algorithm. Instead of always choosing alphabetical ordering, in the case of a tie we choose letters randomly. This has the benefit that our opponent doesn’t know what we will pick. In the first strategy, "rays" is always better than "days". This avoids that issue.

  3. In this case, we pick letters with a probability proportional to the number of words which contain that letter. At the beginning when we tallied the number of words which contain "a" and the number which contain "b" and so on, since "a" happened to be found in the most number of words, in strategies 1 and 2 we picked "a" 100% of the time. Instead, we will still choose "a" a plurality of the time, but a small number of times we’ll pick "z" even though a might be found in 10x more words. I have my doubts about this strategy being optimal but it was used in research in 2010.

So I have two questions:

  1. What is the optimal letter-guessing strategy which I should use assuming that my opponent knows I will use this strategy?
  2. For a given word, say "pays", what is the average number of mistakes M I should be expected to make? Is there a closed-form way to calculate M, as opposed to running this simulation many times and averaging the results?


  • Any English dictionary can be used. For example, this English dictionary with 84k words.. Subsets of dictionaries carefully chosen for ambiguity could also be interesting but are outside of the scope of this question.
  • There is no constraint on the word size as long as the word is in the dictionary. The guesser will know the word size before he begins guessing.
  • Only the total number of mistakes matters, which is independent of the word size. You don’t get more chances for longer words, or fewer chances for shorter ones.

Is there a way to solve the optimal branching / arborescence problem with path-dependent weights?

The optimal branching problem (solved by Edmond’s algo or Tarjan’s algo) finds the spanning arborescence for a particular graph. [0]

I’m looking for a formulation of the problem that allows for path-dependent weights.

i.e. if I have a path from ROOT to A, B, C, with weights 2 for each of the edges:

ROOT =[2]> A =[2]> B =[2]> C

The overall cost of the path is 8, not 6.

Is such a formulation possible?


Optimal Selection of Non-Overlapping Jobs

I’m trying to find what the family of problem is – as well as an approach – for the following:

I have a set of tasks T = [t1, …, tn] to do, each of which has a corresponding reward ri. Each task takes place during a fixed interval – ie: task 1 is from times 1-4, task 2 from 2-5, and task 3 from 9-15. This means that I would have to pick either task 1 or 2 depending on which is more valuable, and then task 3 which does not conflict with either of the previous.

I’d like for this to scale to n tasks, and also to m "CPU’s" – where more than one task can be executed in parallel. This reminds me of the knapsack problem, but maybe an interval graph would provide a better approach?

Any suggestions on how to approach this problem, or any relevant references?

Is there any good method to find if a grammar is optimal for a problem?

I’ve been thinking about grammatical evolution problems and how the grammar influences the algorithm performance. It came to my mind the huge impact that the grammar that you’re using has in the time that takes an algorithm to reach an optimum solution.

The simplest example would be if your problem doesn’t involve trigonometric operations. If you’re trying to find f(x) = 3x - 1/2, including sins, tangents or square roots in your grammar will, almost certainly, slowen your algorithm as the population complexity will grow. Other not-so-evident simplifications for a grammar would be trigonometric identities:

tan(x) = sen(x) / cos(x) 

Talking about this last example, I don’t know how to determine the importance of the impact of including tan(x) between the grammar rules to produce valid solutions. Or in other words, knowing if adding tan(x) will be better in terms of performance than don’t doing it and thus, forcing the evolution to combine two or more operators and terminals to being able to use that operation and making the grammar ambiguous.

So this two are the questions:

  1. Is there any way of knowing if a grammar is optimal for finding a solution?
  2. Which evolutionary algorithm or machine learning method (considering that I’m almost profane in this discipline, some explanation is wellcome) would you use for finding optimal or sub-optimal grammars?


What would be the most optimal way for a Pact of the Chain warlock to make use of a Sprite familiar?

I am playing an Archfey Warlock with the "Pact of the Chain" Pact Boon. For thematic reasons, I have chosen for my familiar to take the form of a sprite with the "fey" creature type.

The restriction I place on this question is that the above facts must remain true, these are fixed aspects that I do not want to change; hence any answer that states that the familiar should take another form, like an imp, are not valid answers to this question, regardless of how much more optimal that might be. In other words, this is not "how to optimise a Pact of the Chain warlock", but "how to optimise the use of the sprite as a Pact of the Chain warlock’s familiar".

Further restrictions are that I do not plan on multiclassing, so answers that require multiclassing are also not valid answers, and that I am limiting the range of levels in play to between 3 and 7, so answers that require me to be a higher level warlock are also not valid (although if an answer includes a "here’s what you can do at high levels too" section as an added extra, I won’t complain).

Given these restrictions, I want to get the best use out of my sprite familiar. Because of it’s tiny HP pool, so far I’ve just had it remain invisible and hide out of sight so that it doesn’t get shot and killed, since I’m still currently only level 3 and don’t really want to waste the resources resummoning it (later in the game, I assume this won’t be as much of a problem, but let’s assume that the familiar’s survivability is a concern of mine nonetheless, but not actually a hard restriction).

What are the best tactics to employ to make the sprite familiar as useful in combat as possible during late tier 1/early tier 2 play? I’m happy for people to suggest spells and invocations that the warlock themselves should pick in order to support the tactics that would enhance the sprite’s usefulness, but I don’t want this to turn into a question about optimising the warlock themselves; the focus should be on the sprite (in other words, assume the warlock is already optimised well enough for the purposes of this question).

Related Q&A: What can a familiar actually do? (except this question is about familiars generally, and the answers do not take into consideration the traits specific–if not unique–to a sprite, such as invisibility or potentially being able to poison an enemy or make it fall unconscious with the sprite’s shortbow attack)

Related Meta Q&A: Would this question about when it's better for a Pact of the Chain warlock to have their familiar attack be an on-topic bounded list question? (although no-one seemed to have an opinion either way, so I’ve gone ahead and asked it anyway; however, this meta Q&A can still be used as a platform to discuss the on-topic-ness of the question, should that become necessary)

I want to play a Hexblade/pact of the blade warlock what’s the optimal way to play/use the character in combat? [closed]

I’m about to play my third ever session of D&D. I’ve been watching live-play for a while but haven’t played much. I’ve made a Levistus Tiefling Warlock with Pact of the Blade and a Hexblade patron starting at level 15.

character info:

Eldritch Invocations.

1-Agonizing Blast. 2-Lifedrinker. 3-Thirsting Blade. 4-Devil’s Sight. 5-Shroud of Shadow. 6-Visions of Distant Realms. 7-Mask of Many Faces.


cantrips: Eldritch Blast – Mage Hand – Minor Illusion – Prestidigitation – Ray of Frost

1-level spells: Charm person – Hex – Armor of Agathys(1/long rest) – Disguise Self(at will) – Arms of Hadar

2-level spells: Hold person – Crown of madness – Darkness(1/long rest) – Invisibility(at will)

3-level spells: hyponatic pattern – Thunder step

4-level spells: Banishment – Summon Greater Demon – Arcane Eye(at will)

5-level spells: Hold Monster – Cone of cold – Far Step – Stynaptic Static

Arcanum spells: Circle of death(6 level) – Forcecage (7 level) – Dominate Monster(8 level)

The main idea is that I’m really loving the idea and roleplay of being a warlock/cursed deal being the faceless charismatic type character and being tied to a weapon as in what the pact of the blade offers.

I prefer a ranged combat character, but I don’t want to be useless in melee. How can I optimally play this build?

Optimal encoding scheme for semi-rewritable memory?

Let’s define a “semi-rewritable” memory device as having the following properties:

  • The initial blank media is initialised with all zeroes.
  • When writing to the media, individual zeroes can be turned into ones.
  • Ones can not be turned back into zeroes.

Making a physical interpretation of this is easy. Consider for instance a punch card where new holes can easily be made, but old holes can not be filled.

What makes this different from a “write once, read many” device is that a used device can be rewritten (multiple times), at the cost of reduced capacity for each rewrite.

Implicit assumptions I would like to make explicit:

  1. The memory reader has no information about what was previously written on the device. It can therefore not be relied upon to use a mechanism such as “which symbols have been changed?” to encode data on a device rewrite. That is, the reader is stateless.
  2. On the other hand, different “generations” of the device may use different encoding schemes as the available capacity shrinks.
  3. The data stored can assumed to be random bits.

Sample storage scheme, to demonstrate rewrite capability:

Information in this scheme is stored on the device as pairs of binary symbols, each pair encoding one of the three states of a ternary symbol, or [DISCARDED] in the case where both symbols have been written.

The first generation thus stores data at a density of $ \frac{log_2(3)}{2} \approx 0.79$ times that of simple binary encoding.

When the device is rewritten, the encoder considers each pair of binary symbols in sequence. If the existing state matches the one it desires to write, the encoder considers the data written. If on the other hand the pair doesn’t match, it writes the necessary modification to that pair, or in the case where that isn’t possible, writes the symbol [DISCARDED] and considers the next pair instead until it has successfully written the ternary symbol.

As such, every rewrite would discard $ \frac{4}{9}$ of existing capacity.

For a large number of cycles, the device would in sum have stored $ \frac{9log_2(3)}{8} \approx 1.78$ times the data of a simple one-time binary encoding.

(For a variation of the above, one could also encode the first generation in binary and then apply this scheme on every subsequent generation. The loss from the first generation to the second would be larger, and the total life time capacity reduced, but the initial capacity would be larger).


  1. Is it possible to have a better life-time capacity than $ \frac{9log_2(3)}{8}$ ? I suspect the the real asymptotic capacity is 2.

  2. Can a scheme do better than having $ \frac{4}{9}$ capacity loss between rewrites?