How do you empirically estimate the most popular seat and get an upper bound on total variation?

Say there are $ n$ seats $ \{s_1, …, s_n\}$ in a theater and the theater wants to know which seat is the most popular. They allow $ 1$ person in for $ m$ nights in a row. For all $ m$ nights, they record which seat is occupied.

They are able to calculate probabilities for whether or not a seat will be occupied using empirical estimation: $ P(s_i ~\text{is occuped})= \frac{\# ~\text{of times} ~s_i~ \text{is occupied }}{m}$ . With this, we have an empirical distribution $ \hat{\mathcal{D}}$ which maximizes the likelihood of our observed data drawn from the true distribution $ \mathcal{D}$ . This much I understand! But, I’m totally lost trying to make this more rigorous.

  • What is the upper bound on $ \text{E} ~[d_{TV}(\hat{\mathcal{D}}, \mathcal{D})]$ ? Why? Note: $ d_{TV}(\mathcal{P}, \mathcal{Q})$ is the total variation distance between distributions $ \mathcal{P}$ and $ \mathcal{Q}$ .
  • What does $ m$ need to be such that $ \hat{\mathcal{D}}$ is accurate to some $ \epsilon$ ? Why?
  • How does this generalize if the theater allows $ k$ people in each night (instead of $ 1$ person)?
  • Is empirical estimation the best approach? If not, what is?

If this is too much to ask in a question, let me know. Any reference to a textbook which will help answer these questions will happily be accepted as well.

Is a Minimum-Spanning-Tree always give a lower bound for the weight of any Hamiltonian cycle of the graph?

A minnimum-spanning-tree (MST) path is always V-1 number of edges and Hamiltonian Cycle (HC) always V number of edges. Because the HC has an extra edge we could say that in general, the weight of every Hamiltonian cycle of a connected graph will be less.

If we take an edge of every Hamiltonian cycle we will find an MST and by that, the MST weight will be always lower than HC.

  • There is a way to prove this where MST tree not necessary belongs to the HC?
  • There is more definitive proof for this?

Upper bound on $\sum_{k=1}^T \frac{1}{k (1+a)^{T-k}}$

Is there any reasonable upper bound for the following quantity $ $ \sum_{k=1}^T \frac{1}{k (1+a)^{T-k}} $ $

where $ a>0$ with respect to $ T$ and $ a$ (something like $ \mathcal{O}(\frac{\log (T)}{aT}$ )? I tried to compute integral $ $ \int_{0}^T \frac{1}{x (1+a)^{T-x}}dx, $ $ which should be upper bound on this sum as $ f(x) = \frac{1}{x (1+a)^{T-x}}$ is decreasing on $ (0, T)$ , but I did not achieve to get reasonable expression.

Lower Bound for Time Complexity of Pairing Problem

Given an array X and array Y both of length n, the pairing algorithm will return the elements of the arrays matched so that the smallest element in X will be matched with the smallest element of Y, the second smallest in X matched with second smallest in Y and so on. i.e the algorithm will yield the pairs: $ (x_{1},y_{1}),…,(x_{i},y_{i}),…,(x_{n},y_{n})$ , $ x_{i}$ being the i’th smallest element in X.
I need to find the lower bound for the time complexity of the problem.

I think the lower bound is O(n) because when the two arrays are sorted we only need to match the elements with the same index in each array – so we’ll need to go over n indices = O(n). But this just proves the existence of the possible time complexity of O(n) and doesn’t prove it’s the minimum.

Is O(n) the lower bound? And if so what is the proof?

Hoeffding to bound Orlicz norm

I have been reading from Weak Convergence and Empirical Processes, and came across the following: Let $ a_1,\ldots,a_n$ be constants and $ \epsilon_1,\ldots,\epsilon_n\sim$ Rademacher. Then

$ \mathbb{P}\left(\left|\sum_i\epsilon_i a_i\right|>x\right)\leq 2\exp\left(-\frac{x^2}{2||a||^2_2}\right)$

Consequently, $ ||\sum_i\epsilon_ia_i||_{\Psi_2}\leq\sqrt{6}||a||_2$ .

How does this follow (relation between Orlicz norm of Rademacher average and L2 norm of constants)? Thank you in advance for your time.

Pebble game lower bound?

This paper says pebble games have super linear lower bound for every fixed $ k$ https://dl.acm.org/citation.cfm?doid=62.322433.

Why is it not considered proof of constructive example for a function in $ NP$ which requires superlinear runtime?

Prove that the upper bound in the Noiseless-coding theorem is strict

Given a probability distribution $ p$ across an alphabet, we define redundancy as:

Expected Length of codewords – entropy of p = $ \ E(S) – h(p)$

Prove that for each $ \epsilon$ with $ 0 \le \epsilon \lt 1$ there exists a $ p$ such that the optimal encoding has redundancy $ \epsilon$ .

Attempts

I have tried constructing a probability distribution like $ p_o = \epsilon, p_1 = 1 – \epsilon $ based on a previous answer, but I can’t get it to work.

Any help would be much appreciated.

Markov inequality: More precise bound?

Given a random variable X, we know that $ P[X\geq A] = 1$ . By Markov inequality, we obtain that $ E[X]\geq A$ . Or, in other words, $ E[X] = A + \lambda,\,\,\lambda\geq 0$ . Is there any way I can more precisely characterize the $ \lambda$ ? E.g., if I know the variance of $ X$ ? Or applying some other bounds, less conservative than Markov’s.

Naming lower and upper bound of a metric

I am creating an API that will expose a list of metrics data. Each metric data will contain the following:

metricName lowerThreshold upperThreshold 

The defined thresholds here are what will determine whether the current value of the metric is good or bad.


The prefix upper and lower will not always hold true. There are metrics where the objective is to minimize a number, in which case the upperThreshold will be lower than the lowerThreshold

What would be the right way to name these bounds such that there is no confusion regarding which one is the better metric without inferring anything about the relationship between the 2 bounds?