Say there are $ n$ seats $ \{s_1, …, s_n\}$ in a theater and the theater wants to know which seat is the most popular. They allow $ 1$ person in for $ m$ nights in a row. For all $ m$ nights, they record which seat is occupied.

They are able to calculate probabilities for whether or not a seat will be occupied using empirical estimation: $ P(s_i ~\text{is occuped})= \frac{\# ~\text{of times} ~s_i~ \text{is occupied }}{m}$ . With this, we have an empirical distribution $ \hat{\mathcal{D}}$ which maximizes the likelihood of our observed data drawn from the true distribution $ \mathcal{D}$ . This much I understand! But, I’m totally lost trying to make this more rigorous.

- What is the upper bound on $ \text{E} ~[d_{TV}(\hat{\mathcal{D}}, \mathcal{D})]$ ? Why? Note: $ d_{TV}(\mathcal{P}, \mathcal{Q})$ is the total variation distance between distributions $ \mathcal{P}$ and $ \mathcal{Q}$ .
- What does $ m$ need to be such that $ \hat{\mathcal{D}}$ is accurate to some $ \epsilon$ ? Why?
- How does this generalize if the theater allows $ k$ people in each night (instead of $ 1$ person)?
- Is empirical estimation the best approach? If not, what is?

If this is too much to ask in a question, let me know. Any reference to a textbook which will help answer these questions will happily be accepted as well.