## Sampling from the uniform distribution

Is there an efficient classical algorithm that generates samples from the uniform distribution? Would such an algorithm exist for any distribution that has an analytic description?

Posted on Categories proxies

## What is Simple Uniform Hashing, and why searching a hashtable has complexity Θ(n) in the worst case

Can anyone explain nicely what Simple Uniform Hashing is, and why searching a hashtable has complexity Θ(n) in the worst case if we don’t have uniform hashing (where n is the number of elements in the hashtable)

Posted on Categories proxies

## Why is the Greedy Algorithm for Online Scheduling on uniform machines NOT competitive?

Consider the Online Scheduling Problem with m machines and different speeds.

Each instance $$\sigma$$ consists of a sequence of jobs with different sizes $$j_i\in\mathbb{R^+}$$. At each step in time any algorithm gets to know the one new job of $$\sigma$$ which needs to be assigned to one machine. The goal is to minimize the makespan (by the end of the online sequence \sigma\$ ).

I want to find an instance such that the Greedy Algorithm, which assigns each new job to the machine which finishes it first, is only $$\Theta(\log m)$$-competitive.

Any ideas? I can’t find any articles regarding my problem.

Posted on Categories proxies

## Make image boundaries uniform

I have the following image (fig 1) with the extracted points from the geomagic software (Please see the point list in the attached link).

https://pastebin.com/K51N8Kfa I would like to know how I can remove the indented boundaries of the shape. This should be done by removing associated points causing indentation from the list (fig 2). I need the edges to be uninformed so that later I can normalize the height of the image. Posted on Categories cheapest proxies

## How can I generate a random sample of unique vertex pairings from a undirected graph, with uniform probability?

I’m working on a research project where I have to pair up entities together and analyze outcomes. Normally, without constraints on how the entities can be paired, I could easily select one random entity pair, remove it from the pool, then randomly select the next entity pair.

That would be like creating a random sample of vertex pairs from a complete graph.

However, this time around the undirected graph is now

• incomplete
• with a possibility that the graph might be disconnected.

I thought about using the above method but realized that my sample would not have a uniform probability of being chosen, as probabilities of pairings are no longer independent of each other due to uneven vertex degrees.

I’m banging my head at the wall for this. It’s best for research that I generate a sample with uniform probability. Given that my graph has around n = 5000 vertices, is there an algorithm that i could use such that the resulting sample fulfils these conditions?

1. There are no duplicates in the sample (all vertices in the graph only is paired once).
2. The remaining vertices that are not in the sample do not have an edge with each other. (They are unpaired and cannot be paired)
3. The sample generated that meets the above two criteria should have a uniform probability of being chosen as compared to any other sample that fulfils the above two points.

There appear to be some work done for bipartite graphs as seen on this stackoverflow discussion here. The algorithms described obtains a near-uniform sample but doesn’t seem to be able to apply to this case.

Posted on Categories proxies

## Uniform Hashing. Understanding space occupancy and choice of functions

I’m having troubles understanding two things from some notes about Uniform Hashing. Here’s the copy-pasted part of the notes:

Let us first argue by a counting argument why the uniformity property, we required to good hash functions, is computationally hard to guarantee. Recall that we are interested in hash functions which map keys in $$U$$ to integers in $$\{0, 1, …, m-1\}$$. The total number of such hash functions is $$m^{|U|}$$, given that each key among the $$|U|$$ ones can be mapped into $$m$$ slots of the hash table. In order to guarantee uniform distribution of the keys and independence among them, our hash function should be anyone of those ones. But, in this case, its representation would need $$\Omega(log_2 m^{|U|}) = \Omega(|U| log_2 m)$$ bits, which is really too much in terms of space occupancy and in the terms of computing time (i.e. it would take at least $$\Omega(\frac{|U|log_2 m}{log_2 |U|})$$ time to just read the hash encoding).

The part I put in bold is the first thing is confusing me.

Why the function should be any one of those? Shouldn’t you avoid a good part of them, like the ones sending every element from the universe $$U$$ into the same number and thus not distributing the elements?

The second thing is the last "$$\Omega$$". Why would it take $$\Omega(\frac{|U|log_2 m}{log_2 |U|})$$ time just to read the hash encoding?

The numerator is the number of bits needed to index every hash function in the space of such functions, the denominator is the size in bits of a key. Why this ratio gives a lower bound on the time needed to read the encoding? And what hash encoding?

Posted on Categories proxies

## Asymmetric Transition Probability Matrix with uniform stationary distribution

I am solving a discrete Markov chain problem. For this I need a Markov chain whose stationary distribution is uniform(or near to uniform distribution) and transition probability matrix is asymmetric.

[ Markov chains like Metropolis hasting has uniform stationary distribution but transition probability matrix is symmetric ]

Posted on Categories proxies

## PAC learning vs. learning on uniform distribution

The class of function $$\mathcal{F}$$ is PAC-learnable if there exists an algorithm $$A$$ such that for any distribution $$D$$ it holds that on an input of $$m$$ i.i.d samples $$(x, f(x))$$ where $$x\sim D$$ and $$f$$ is unknown, $$A$$ returns, with probability larger than $$1-\delta$$, a function which is $$\epsilon$$-close to $$f$$ (with respect to $$D$$). The class $$\mathcal{F}$$ is efficiently PAC learnable if it is PAC learnable, and $$A$$ runs in time $$\text{poly}(1/\epsilon, 1/\delta)$$.

Is there a case where a class $$\mathcal{F}$$ is not efficiently PAC learnable, yet it is efficiently learnable of the uniform distribution?

Posted on Categories proxies

## Proof that uniform circuit families can efficiently simulate a Turing Machine

Can someone explain (or provide a reference for) how to show that uniform circuit families can efficiently simulate Turing machines? I have only seen them discussed in terms of specific complexity classes (e.g., $$\mathbf{P}$$ or $$\mathbf{NC}$$). I would like to see how uniform circuit families is a strong enough model for universal, efficient computation.

Posted on Categories proxies

## PTAS for Multiple Knapsack with Uniform Capacities, fixed number of Knapsacks

Consider the following problem:

We are given a collection of $$n$$ items $$I = \{1,…n\}$$, each item has a size $$0 < s_i \le 1$$ and a profit $$p_i > 0$$. There are $$m$$ (a fixed number) of unit-size knapsacks. A feasible solution is an $$m$$-tuple $$U=\{U_1,…,U_m\}$$, such that the size of items in each knapsack doesn’t exceed its capacity, and each item is packed in no more than one knapsack. more formally:

• for every $$j, 1 \le j \le m, U_j \subseteq I$$ and $$\sum_{i \in U_j}s_i \le 1$$
• for every $$j,l, 1 \le j < l \le m, U_j \cap U_l = \phi$$

the profit of the feasible solution is $$\sum_{j=1}^m\sum_{i \in U_j}p_i$$. The goal is to find a feasible solution of maximal profit.

I’m trying to show a PTAS for the problem.

It was suggested to use linear programming. I thought about the following (basic) linear program:

maximise $$\sum_{j=1}^m\sum_{i=1}^n x_{ij}p_i$$

under the constraints:

• for every $$1 \le j \le m \sum_{i=1}^n s_ix_{ij} \le 1$$ (the size of items in each knapsack doesn’t exceed its capacity)
• for every $$1 \le j \le n \sum_{j=1}^m x_{ij} \le 1$$ (each item is packed in no more than one knapsack).

I don’t know how to proceed from here. I’m not sure how to develop an algorithm (choose in which knapsack to put each item) based on this linear program. Can anyone pls give me a clue?

Thanks.

Posted on Categories proxies