This is a very broad question, I’m not sure if cstheory is the better place.
How can I compare the conditional surprisal of words that vary in frequency?
The $ count(w,context)$ depends on the frequency of the word w because this can be further broken down to $ p(w|context)count(w)$ . This means a word that is more frequent will have a lower surprisal.
Is there a way to compare the surprisal of words with varying count/frequency, i.e control for frequency? Do I just divide $ count(w,context)$ by $ count(w)$ to normalize by count?