How can Kneser-Ney Smoothing be integrated into a neural language model?


I found a paper titled Multimodal representation: Kneser-Ney Smoothing/Skip-Gram based neural language model. I am curious about how the Kneser-Ney Smoothing technique can be integrated into a feed-forward neural language model with one linear hidden layer and a softmax activation. What is the purpose of the Kneser-Ney in such a neural network, and how can it be used for learning the conditional probability for the next word?