Distributional error probability of deterministic algorithm implies error probability of randomized algorithm?

Consider some problem $ P$ and let’s assume we sample the problem instance u.a.r. from some set $ I$ . Let $ p$ be a lower bound on the distributional error of a deterministic algorithm on $ I$ , i.e., every deterministic algorithm fails on at least a $ p$ -fraction of $ I$ .

Does this also imply that every randomized algorithm $ \mathcal{R}$ must fail with probability $ p$ if, again, we sample the inputs u.a.r. from $ I$ ?

My reasoning is as follows: Let $ R$ be the random variable representing the random bits used by the algorithm. \begin{align} \Pr[ \text{$ \mathcal{R}$ fails}] &= \sum_\rho \Pr[ \text{$ \mathcal{R}$ fails and $ R=\rho$ }] \ &= \sum_\rho \Pr[ \text{$ \mathcal{R}$ fails} \mid R=\rho] \Pr[ R=\rho ] \ &\ge p \sum_\rho \Pr[ R=\rho ] = p. \end{align} For the inequality, I used the fact that once we have fixed $ R = \rho$ , we effectively have a deterministic algorithm.

I can’t find the flaw in my reasoning, but I would be quite surprised if this implication is true indeed.

Find path of file on website with randomized string in it

Users have the possibility to upload a sensitive personal file to a specific website. After uploading, only the user himself and the administrator of the website have the ability to download the file again.

All files of any user are uploaded to the following folder: https://example.com/folder/uploads/.

Before a file is uploaded, it gets renamed to <<username>>.docx.

So for Foo the path would be: https://example.com/folder/uploads/foo.docx and for Gux it’d be https://example.com/folder/uploads/gux.docx.

As you can see, this is not safe at all. Foo could simply examine the download link, and replace his own name in the file-path with the username of other users to download their files.

So to prevent this, the web-developer did the following: Before a file is uploaded, a random string of 15 characters gets prepended to the filename. This random string is different for each upload.

For Foo this would be for example https://example.com/folder/uploads/heh38dhehe83ud37_foo.docx and for Gux https://example.com/folder/uploads/abcnjoei488383b22_gux.docx.

When Foo examines the download-url, he will know in which folder all the files are stored. But there is no way that he could guess the random string that is prepended to Gux’ file. The random string actually functions as a 15-character long password.

In all directories an index.html-file is placed so the directory content does not get listed.

If Foo still wanted to download other users their files, how would he do that? I’m looking for a way Foo would do this by forming a specific URL or HTTP-request. Breaching the server or database is a correct answer but not the kind I’m looking for.

TL;DR: How to find the path of a file on a public website with a unique and randomized string in it? You know the path to the upload-folder, but there is a index.html-file there so you can’t see the content.

Purpose of randomization/derandomization in basic randomized algorithm for MAX SAT

In Sections 5.1 of The Design of Approximation Algorithms by Williamson and Shmoys, they describe a basic randomized algorithm for MAX SAT and how to derandomize it. The algorithm is just to assign each variable 1 (true) with probability 1/2 and 0 (false) with probability 1/2. In other words, sample uniformly at random from the space of all solutions. They show that this is a 1/2-approximation.

Then in Section 5.2, they describe how to derandomize it using the method of conditional expectations. (I won’t describe the process here because it is not very complex and widely known I’m assuming.)

My question is, why bother derandomizing this way? Or even, why bother making the algorithm random in the first place?

It seems to me that an equally good algorithm would be the one-liner which deterministically sets all variables to 1. Given some MAX SAT instance as input, it seems to me that you would also expect this to (i.e., "in expectation it would") satisfy half of the clauses. To me, the analysis of the random algorithm really seems to say that any fixed guess is "good." (Rather than showing that our random algorithm is inherently good.) So why go through the process of randomizing and derandomizing in the first place?

Thanks in advance!

Worst-case expected running time for Randomized Permutation Algorithm

I have an algorithm which, when given a positive integer N, generates a permutation of the first N integers (from 1 to N) using a method called randInt(x,y). The method randInt(x,y) will generate a random integer between the numbers x and y, provided they are positive integers and y >= x.

The algorithm is given by the following pseudo-code:

1.  if (N <= 0) { 2.     return null 3.  } else { 4.     A := new int[] w/ size N and all cells initialized to 0 5.     a[0] := randInt(1,N) 6.     for (i := 1 to length(A)-1) do  7.        boolean rInA := True 8.        while (rInA) { 9.           rInA := False  10.          int r := randInt(1,N) 11.          for (j := 0 to (i-1)) do  12.             if (r = A[j]) { 13.                rInA := True 14.             } 15.          }    16.       } 17.       A[i] := r 18.    } 19. } 20. return A 

My understanding of the algorithm is as follows:

The outermost for-loop will run N-1 times and for each of those iterations a random number is generated and then compared to all the previous cells of A that have been visited in previous iterations. If any of the those cells contain that randomly generated number then that number cannot be used and a new number is randomly generated (in the next iteration of that nested while-loop). This new randomly generated number is then, like before, compared to all the previously visited cells in A to check for duplication. This continues until randInt(x,y) generates a random number that is not already in the first i cells of A.

This leads me to believe that the Worst-case expected running time of the algorithm is something like: $ \sum_{i=1}^{N-1}(\alpha i)$

Now the $ \alpha$ here represents the effect the while-loop has on the running time and is the point of uncertainty for me. I know that in the first iteration of the outermost for loop its unlikely that randInt will generate the one integer that A already contains (1/N I believe) so that inner-most for-loop is likely to only execute once. However, by the last iteration (of outer-most for-loop) the probability that randInt generates one of the N-1 integers already in A is $ \frac{N-1}{N}$ so because of the while-loop its likely that the inner-most for-loop for that iteration (of the outer-most for-loop) will execute more like n times.

How can I use the probability introduced into the algorithm by randInt to calculate the algorithms run-time?

Multi-agent randomized behavior

In Artificial Intelligence: A Modern Approach Edition 3, Page of 43, At Single Vs. Multi-agent section’s last line, Writer says,

In some competitive environments, randomized behavior is rational because it avoids the pitfalls of predictability.

Can someone help me understanding this line? Specially, what does he mean by pitfalls of predictability ?

Randomized version of the class $APX$?

Is there a class which is to APX what BPP is to P?

I’m looking for a definition that is like the following:

“For $ r > 0$ , an $ r$ -RPCA (randomized polynomial-time constant-factor approximation) algorithm for a function problem $ T : \Sigma^* \to \mathbb{N}$ is a probabilistic Turing machine $ A$ with the following property: $ A$ runs in time $ poly(|x|)$ and has $ \mathbb{P}( r^{-1} T(x) \leq A(x) \leq r T(x)) \geq 2/3$ .”

I think that either a class like this exists and has a standard name, or there is something wrong with it. I’m looking for a similar definition with which to cleanly state a result.

What is the exact time complexity of randomized Kuhn’s algorithm?

Please, read the whole question before answering, the exact details of the implementation are important.

Suppose that you want to find largest cardinality bipartite matching in bipartite graph with $ V = L + R$ vertices ($ L$ is the number of vertices in the left-hand side and $ R$ is the number of the vertices in the right-hand side) and $ E$ edges. You may assume that graph is connected, therefore $ E \geqslant V – 1$ .

Vertices in the left-hand side are numbered with integers from range $ [0, L)$ . Similarly, vertices in the right-hand side are numbered with integers from range $ [0, R)$ . Then, the classic implementation of Kuhn’s bipartite matching algorithm looks like this:

bool dfs_Kuhn (v, neigh, used, left_match, right_match):     if used[v]         return v     used[v] = true      for dest in neigh[v]         if right_match[dest] == -1 || dfs(dest, neigh, used, left_match, right_match)             left_match[v] = dest             right_match[dest] = v             return true      return false  int bipartite_matching_size (neigh):     left_match = [-1 repeated L times]     right_match = [-1 repeated R times]      for i in [0, L)         used = [false repeated L times]         dfs_Kuhn(i, neigh, used, left_match, right_match)      return L - (number of occurences of -1 in left_match) 

This implementation works in $ O(VE)$ time, moreover the bound is tight more or less independently of relations between $ V$ and $ E$ . In other words, the bound is tight for sparse graphs ($ E = O(V)$ ), for dense graphs ($ E = \Omega(V^2)$ ) and for everything in-between.

There is an implementation that works much faster in practice. The $ \texttt{dfs_Kuhn}$ function does not change, but $ \texttt{bipartite_matching_size}$ changes:

int bipartite_matching_size_fast (neigh):     left_match = [-1 repeated L times]     right_match = [-1 repeated R times]      shuffle(neigh)     for row in neigh         shuffle(row)      while true         used = [false repeated L times]         found_path = false          for i in [0, L)             if left_match[i] == -1                 found_path |= dfs_Kuhn(i, neigh, used, left_match, right_match)           if !found_path             break      return L - (number of occurences of -1 in left_match) 

Of course, upper bound of $ O(VE)$ can be proven for the faster version as well. Lower bounds are completely different story, though.

We used two optimizations:

  1. The block of code inside $ \texttt{while true}$ works in total $ O(E)$ time, but often finds several disjoint augmenting paths, instead of at most one, as did the block inside $ \texttt{for in in [0, L)}$ in the original code.

  2. The order of vertices in the left-hand side and the order in which the for-loop $ \texttt{for dest in neigh[v]}$ considers their neighbours are now random.

If only the first of these two optimisations is used, there are some relatively well-known degenerate cases when the code still takes $ \Omega(VE)$ time. However, almost all such cases that I know abuse specific ordering of neighbours of the left-hand side vertices, so the $ \texttt{dfs_Kuhn}$ function is forced to repeatedly go along some fixed very long path and “flip it”. Therefore, they fall apart when the second optimisation is added.

The only semi-strong test I know is a dense ($ E = \Theta(V^2)$ ) graph, in which the fast version of Kuhn’s algorithm takes $ \Omega(V^3 / \log V)$ time. However, all my attempts to generalise that construction to sparse graphs (the case I am most interested in) were unsuccesful.

So, I want to ask the following question. Is something known about runtime of this fast version of Kuhn’s algorithm on sparse graphs? Any nontrivial lower bounds (better than $ \Theta(E \cdot \log V)$ )? Maybe some upper bounds (one my friend believes that this algorithm always runs in $ O(E \sqrt{E})$ time, which seems to be the case on random bipartite graphs)?