How Joint Probability Distributions are used to solve the problem of missing inputs in Classification

With n input variables, we can now obtain all 2^n different classification functions needed for each possible set of missing inputs, but the computer program needs to learn only a single function describing the joint probability distribution.

This is page 98 of Ian Goodfellow’s Deep Learning Book. My confusion comes from how joint probability distributions are used to solve the problem of missing inputs. What are the random variables in this scenario? I don’t really understand the connection here so if someone could please elaborate that would be great.