Let $ M > 0$ , $ k$ be a positive integer, and $ \mathcal V:=[-M,M]^k$ . Finally, let $ p \in \Delta_k$ , (where $ \Delta_k$ is the $ (k-1)$ -dimensional probability simplex) and let $ \hat{p}_n$ be an empirical version of $ p$ based on an iid sample of size $ n$ . Given $ \delta \in (0, 1)$ , my objective to obtain a uniform-bound of the form

**[Objective]** $ \text{Proba}(\sup_{v \in \mathcal V}|p^Tv-\hat{p}_n^Tv| \le \epsilon_n) \ge 1 – \delta $ , for some $ \epsilon_n >0 $ (the smaller the better).

# Idea using covering argument

Presumably, for each $ v \in \mathcal V$ , I can use Bernstein’s inequality to control $ |p^Tv-\hat{p}_n^Tv|$ . For example,

$ $ \text{Proba}\left(|p^Tv-\hat{p}_n^Tv| \le \left(\operatorname{Var}_p(v)\frac{\log(2/\delta)}{n}\right)^{1/2} + \frac{2M\log(2/\delta)}{3n}\right) \ge 1 -\delta. $ $

On the other hand,

The mapping $ G:v \mapsto |p^Tv-\hat{p}_n^Tv|$ is $ 2$ -Lipschitz w.r.t the $ \ell_\infty$ -norm on $ \mathbb R^k$ .

Indeed, for all $ v’,v \in \mathcal V$ , one has $ $ \begin{split} |G(v’)-G(v)| &:= ||p^Tv’-\hat{p}_n^Tv’|-|p^Tv-\hat{p}_n^Tv|| \le |p^Tv’-\hat{p}_n^Tv’-(p^Tv-\hat{p}_n^Tv)|\ &= |p^T(v’-v)-\hat{p}_n^T(v’-v)| \le |p^T(v’-v)|+|\hat{p}_n^T(v’-v)| \ &\le (\|p\|_1+\|\hat{p}_n\|_1)\|v’-v\|_\infty = 2 \|v’-v\|_\infty, \end{split} $ $ where the first and second inequalities are triangle inequalities, the third inequality is a Cauchy-Schwarz inequality, and the last inequality is because $ p,\hat{p}_n \in \Delta_k$ are probability distributions.

Also, the sup-norm covering number of $ \mathcal V$ is $ \mathcal N_\infty(\mathcal V;\varepsilon)\le(2M/\varepsilon)^k$ .

By using the fact that $ \|v\|_\infty \le M$ for all $ v \in \mathcal V$ , I can replace the variance term in the above Bernstein bound (i.e we’d use a Hoeffding inequality instead) to get $ \operatorname{Var}_p(v) \le M^2$ for all $ v \in \mathcal V$ , and then use covering arguments (e.g see https://mathoverflow.net/a/322161/78539) to get an inequality of the sough-for form **[Objective]** above. However, such an inequality is presumably “blurred”.

# Question

How can these ramblings be pieced together to obtain a strong uniform inequality of the form **[objective]** ? Of course, I’m more than happy to learn other tricks for obtain such a results, which might not use any of the ideas I’ve discussed above.