What is the optimal algorithm for playing the hangman word game?


Suppose we are playing the game hangman. My opponent picks a word at random from the dictionary, which we both have access to during the game. Once my opponent has picked a word, I am given only the length of the word. I can guess a letter which I think will be in the word, and if it is in the word my opponent identifies all of the positions of that letter in that word. If I guess a letter which isn’t in the word, that’s counted as one mistake. If I can guess the word before too many mistakes I win.

My opponent’s goal should be to pick a word which maximizes the number of mistakes (incorrect guesses) I make before I can guess the word. My goal is to minimize them. (Traditionally, if the number of mistakes is > some number then I lose the game, but we can relax that constraint.)

Consider three algorithms for letter guessing. These are the main ones I’ve thought of, but there are probably more, and I welcome alternate ideas. In all three algorithms, the first step will be to gather the list of words which are the same length as the secret word. Then, for each letter in the alphabet, count the number of words in the dictionary which contain that letter. For example, maybe 5000 contain "a", 300 contain "b", and so on. Here it is in python:

    alphabet = list('abcdefghijklmnopqrstuvwxyz')      while not found:         probabilities = dict.fromkeys(alphabet, 0)          for word in dictionary:             for letter in word:                 if letter in alphabet:                     probabilities[letter] += 1          # now we have the letter frequencies  

After that is where the three algorithms diverge.

  1. In the first algorithm, we guess the letter which the most number of remaining words contain. So if 5000 remaining words contain "a" and no other letters are in that many words, we will pick "a" every time. If "a" is correct, this gives us positional information which we can filter the list further. For example, we might filter the list by all words that match ".a..". (Dots are unknown letters.) If "a" is incorrect, we filter out all words which contain "a". In the case where there is a tie and two letters are found in an equal number of words, letters are chosen alphabetically. So if we know the word matches ".ays", we’ll just guess words in alphabetical order.

  2. This is only slightly different from the first algorithm. Instead of always choosing alphabetical ordering, in the case of a tie we choose letters randomly. This has the benefit that our opponent doesn’t know what we will pick. In the first strategy, "rays" is always better than "days". This avoids that issue.

  3. In this case, we pick letters with a probability proportional to the number of words which contain that letter. At the beginning when we tallied the number of words which contain "a" and the number which contain "b" and so on, since "a" happened to be found in the most number of words, in strategies 1 and 2 we picked "a" 100% of the time. Instead, we will still choose "a" a plurality of the time, but a small number of times we’ll pick "z" even though a might be found in 10x more words. I have my doubts about this strategy being optimal but it was used in research in 2010.

So I have two questions:

  1. What is the optimal letter-guessing strategy which I should use assuming that my opponent knows I will use this strategy?
  2. For a given word, say "pays", what is the average number of mistakes M I should be expected to make? Is there a closed-form way to calculate M, as opposed to running this simulation many times and averaging the results?

Clarifications

  • Any English dictionary can be used. For example, this English dictionary with 84k words.. Subsets of dictionaries carefully chosen for ambiguity could also be interesting but are outside of the scope of this question.
  • There is no constraint on the word size as long as the word is in the dictionary. The guesser will know the word size before he begins guessing.
  • Only the total number of mistakes matters, which is independent of the word size. You don’t get more chances for longer words, or fewer chances for shorter ones.