Probability of selecting a particular set, by sampling without replacement from a categorical distribution

Suppose I have a categorical distribution on items $ 1,\dots,n$ , that assigns probability $ p_i$ to item $ i$ . I now repeatedly sample from this distribution, until I have obtained $ k$ unique objects. I’d like to compute the probability that the set of objects obtained is exactly $ \{1,\dots,k\}$ .

Is there an efficient way to compute this probability, given $ p_1,\dots,p_n$ and $ k$ ?

I can see that the probability has the form

$ $ p = \sum_\sigma \prod_{i=1}^k {p_{\sigma(i)} \over (1-p_{\sigma(1)}) \cdots (1-p_{\sigma(1)}-\dots-p_{\sigma(i-1)})},$ $ where the sum is over all permutations $ \sigma \in S_k$ on $ \{1,\dots,k\}$ . (Here $ \sigma$ represents the order in which the items $ 1,\dots,k$ are selected.) However, this formula for the probability involves $ k!$ terms, so computing the probability in this way would take time exponential in $ k$ . Is there a more efficient way to compute it?

Of course, without loss of generality we can assume $ n=k+1$ .

Probability of terminating in a state in a probabilistic algorithm

Suppose i have a circular array of $ n$ elements. At time $ t=0$ i am in position 0 of the array. The algorithm moves left or right with probability $ p=1/2$ (since the array is circular when it moves left from 0 it goes to position $ n$ ). When i visited all the positions at least once the algorithm returns the position it terminated into.
Launching the algorithms many times shows that it stops with the same probability in any of the n positions (except for zero obviously). How do i demonstrate that the probability of ending up in each state is uniform?
My understanding is that this process is explained by a doubly stochastic Markov chain, but is there a theorem i can quote about this specific case?

What kind of bigram probability smoothing is this?

I hope it isn’t off topic but I need to understand this example. Given the corpus 12 1 13 12 15 234 2526 and smoothing factor of k=1. The example does the following operations:

Considers OOV(out of vocabulary) words and assigns them a zero times value, after that k=1 is added to the times every words appears, to avoid zero probabilities. So the result of smoothing the bigrams probability will be:

$ P(1|12)=(1+k)/(2+2+6*k)=0.2$
$ P(15|12)=(1+k)/(2+2+6*k)=0.2$
$ P(13|1)=(1+k)/(2+6*k)=0.25$
$ P(12|13)=(1+k)/(2+6*k)=0.25$
$ P(234|15)=(1+k)/(2+6*k)=0.25$
$ P(2526|234)=(1+k)/(2+6*k)=0.25$

My question is, What kind of smoothing is this? shouldn’t be for example like this?; $ P(1|12)=(1+k)/(2+6*k)=0.25$
Besides it also says “If OOV words appear, you need to use smoothing to return a value; $ P(234|12)=1/((2/7)*6+6)=0.1296$

PS: I take this example from a small section of the translated version of this chinese webpage, it is just explaining a code implementation.

How Should Speed and Range Affect Hit Probability?

If one were to go for an increased degree of realism, and try to build a probability curve that produces most sensible results (but simplified, of course, since there is no such thing as a perfect simulation), then approximately what sort of correlation should there be between distance to the target, speed of the target, and the chance to hit the target (under otherwise similar circumstances, i.e. same aiming time, weapon, character/skill etc.)?

Examples: There are systems which reduce the chance to hit by the same percent per range fixed increment added to the range of the target. There are systems which stack range penalties by a logarithmic function of range (e.g. a stacking penalty per doubling until reaching some cutoff range). There are systems which provide a constant speed penalty entirely separately from range, and systems which add speed and range when calculating the penalty. Some of these systems’ probability effects are complicated by the fact that they use non-linear dice curves. Some argue that the function of probability reduction should be a quadratic relation to range, since for each doubling of range, the target’s projection becomes ¼ of its previous observed value (percent of FoV taken up), but I don’t recall any systems that explicitly and deliberately implemented anything like that.

After asking elsewhere, I’ve been pointed to Steering law and Fitt’s law, but seem to be meant for fixed accuracy and variable time, while in RPGs, fixed aim time and variable chance to hit are much more workable.

Note that I’m not asking about which dice mechanics to use for modelling those probability adjustments, as I’m assuming that there are multiple ways of fitting dice to a desired probability function, but first I’d like to learn what probability functions are most fitting (simplified and generalised, of course) representation of real life shooting situations.

Number of executions of the algorithm with probability about graphs

Consider an undirected graph $ G = (V, E)$ representing the social network of friendship/trust between students. We would like to form teams of three students that know each other. The question is to decide whether the network allows for enough such teams, without checking all the triples of graph $ G$ . For this reason, we use random sampling to design an efficient estimator of the number of connected triples. We partition the set of node triples into four sets $ T_0, T_1, T_2$ , and $ T_3$ . A node triple $ v1, v2, v3$ belongs to

  • $ T_0$ iff no edge exists between the nodes $ v1, v2$ , and $ v3$ ,
  • $ T_1$ iff exactly one of the edges $ (v1, v2), (v2, v3)$ , and $ (v3, v1)$ exists,
  • $ T_2$ iff exactly two of the edges $ (v1, v2)$ , $ (v2, v3)$ , and $ (v3, v1)$ exist,
  • $ T_3$ iff all of the edges $ (v1, v2), (v2, v3)$ , and $ (v3, v1)$ exist.

$ |T_3|$ denotes the number of connected triples in the graph that is the quantity we need to estimate. Consider the following algorithm:

• Sample an edge $ e = (a, b)$ uniformly chosen from $ E$

• Choose a node v uniformly from $ V \setminus (a,b)$

• if $ (a, v) ∈ E$ and $ (b, v) ∈ E$ then $ x = 1$ , else $ x = 0$

This exercise about algorithms asks me to find a nontrivial number $ s$ of executions of the algorithm which are sufficient in order to obtain an $ (1 + \epsilon)$ and an $ (1 − \epsilon)$ approximation of $ |T_3|$ with probability at least $ 1 − δ$ . I don’t know how to behave with the probability, could you give me some help to solve this?

Probability of detecting errors in codewords

I have been struggling with the below question for quite some time, and I don’t have a pointer to move forward.

A certain Error Control Coding scheme using block codes takes an input block (dataword) of 500 bits and appends a 50 bit code to produce a 550 bit codeword which is then transmitted across channel that causes individual bits to flip with a probability of 0.1 independently. The pairwise Hamming distances between all the codeword pairs is so large that the probability of an error occurring that converts one to the other can be neglected, except as follows: two codewords C 1 and C 2 have a Hamming distance of 10, and two codewords C 3 and C 4 have a Hamming distance of 6. Assuming no knowledge about what datawords may be more or less likely to be desired to transmit, what is the probability that a given block transmission will be corrupted by the channel but the error will go undetected by the receiver? You may answer with an expression, but the answer has to be completely numerical (no symbols). 

My thought process about this question is that the probability needs to be calculated as such : Pr(Selecting either C1 or C2) * P(error in C1 or C2) + Pr(Selecting either C3 or C4) * P(error in C3 or C4).
I feel the Pr(error) is given by a binomial distribution of 55CX(0.1)^x(0.9)^550-x where X=10 or 6.

First, am I thinking about the problem correctly ? If yes , how do i derive the probability of selection fo a particular codeword ?

Probability of colisson for classes of hash functions

I am going through some old exams in one of my courses, and I don’t have acces to solutions, and found a exercise I am not sure how to tackle. I am not looking for the answer but some help/guidance if possible.

Consider a class of hash funtions , $ \mathcal{H}$ , over a finite universe $ {U}$ into {0, … , n-1}. Where $ |U|$ , n > 1.

Show that for any family of hash functions: $ $ Pr[h(u_1) = h(u_2)] \geq \frac{1}{n} – \frac{1}{|U|} $ $ Here, the probability is over the choice of the hash function h drawn uniformly at random from the family H.

How can I measure the cost of this algorithm pseudo 3-coloring graph problem using probability algorithm?

Problem: I have a classic 3-coloring graph problem where I have to get at least 2/3 well-colored edges from the total edges. By well-colored I mean that the two vertex of one edge are of different color. I have to use a proabilistic algorithm with polynomial average cost.

¿Solution?: I assign each vertex one of the 3 possible colors randomly. So the probability that one edge is well-colored is 2/3. The cost of assign each vertex a random color is linear if I am not wrong. And since I have 2/3 probability that graph cost is well-colored, the probability of having found a solution is 2/3. So with k being a low natural number of bound attemps and n vertex, algorithm’s cost is $ n^{k}$

Doubts: ¿Is solution reasoning about the time cost okey? ¿Is this solution a sort of Las Vegas algorithm? Thanks in advance.