# Information Theory: Comparing surprisal of words with varying count frequency

This is a very broad question, I’m not sure if cstheory is the better place.

How can I compare the conditional surprisal of words that vary in frequency?

$$S(w|context)=−log(p(w|context))=−log(\frac{count(w,context)}{count(context)})$$

The $$count(w,context)$$ depends on the frequency of the word w because this can be further broken down to $$p(w|context)count(w)$$. This means a word that is more frequent will have a lower surprisal.

Is there a way to compare the surprisal of words with varying count/frequency, i.e control for frequency? Do I just divide $$count(w,context)$$ by $$count(w)$$ to normalize by count?