## How to reduce entropy?

This is not necessarily a research question, since I do not know, if someone is working on this or not, but I hope to gain some insight by asking it here:

The idea behind this question is to attach to a natural number in a “natural” way an entropy, such that “multiplication increases entropy”. (Of course one can attach very different entropies to natural numbers such that multiplication reduces entropy, but I will try to give an argument, why this choice is natural.) Let $$n$$ be a composite number $$\ge 2$$, $$\phi$$ the Euler totient function. Suppose a factorization algorithm $$A$$ outputs a number $$X$$, $$1 \le X \le n-1$$ with equal probability $$\frac{1}{n-1-\phi(n)}$$ such that $$1 < \gcd(X,n) < n$$. Then we can view $$X$$ as a random variable and attach the entropy to it $$H(X_n):=H(n):= \log_2(n-1-\phi(n))$$.

The motivation behind this definition comes from an analogy to physics: I think that “one-way-functions” correspond to the “arrow of time” in physics. Since “the arrow of time” increases entropy, so should “one-way-functions”, if they exist. Since integer factorization is known to be a candidate for owf, my idea was to attach an entropy which would increase when multiplying two numbers.

It is proved here ( https://math.stackexchange.com/questions/3275096/does-entropy-increase-when-multiplying-two-numbers ) that:

$$H(mn) > H(n) + H(m)$$ for all composite numbers $$n \ge 2, m \ge 2$$

The entropy of $$n=pq$$, $$p will be $$H(pq) = \log_2(p+q-1) > \log_2(2p-1) \approx \log_2(2p) = 1+\log_2(p)$$.

At the beginning of the factorization, the entropy of $$X_n$$, $$n=pq$$ will be $$\log_2(p+q-1)$$ since it is “unclear” which $$X=x$$ will be printed. The algorithm must output $$X=x$$ as described above. But knowing the value of $$X$$, this will reduce the entropy of $$X$$ to $$0$$. So the algorithm must reduce the entropy from $$\log_2(p+q-1)$$ to $$0$$. From physics it is known, that reducing entropy can be done with work. Hence the algorithm “must do some work” to reduce the entropy.

My question is, which functions do reduce entropy? (So I am thinking about how many function calls will the algorithm at least make to reduce the entropy by the amount described above?)

## Does multiplication increase entropy?

Does multiplication increase entropy?

The Shannon entropy of a number $$k$$ in binary digits is defined as $$H = -\log(\frac{a}{l})\cdot\frac{a}{l} – \log(1-\frac{a}{l})\cdot (1-\frac{a}{l})$$ where $$l = \text{ floor }(\frac{\log(k)}{\log(2)})$$ is the number of binary digits of $$k$$ and $$a$$ is the number of $$1$$-s in the binary expansion of $$k$$. So we view the number $$k$$ as a “random variable”.

Suppose that $$n,m$$ are uniformly randomly chosen in the interval $$1 \le N$$.

Hypothesis 1):

$$H_{m \cdot n}$$ is “significantly” larger then $$H_n$$.

Hypothesis 2):

$$H_{m + n}$$ is not “significantly” larger then $$H_n$$.

Here is some empirical statistical test indicating that multiplication increases entropy, but addition does not:

def entropyOfCounter(c):     S = 0     for k in c.keys():         S += c[k]     prob = []     for k in c.keys():         prob.append(c[k]/S)     H = -sum([ p*log(p,2) for p in prob]).N()     return H  def HH(l):     return entropyOfCounter(Counter(l))  N  = 10^4 HN = [] HmXn = [] HmPn = [] for k in range(N):     n = randint(1,17^50)     m = randint(1,17^50)     Hn = HH(Integer(n).digits(2))     Hm = HH(Integer(m).digits(2))     HmXn.append(HH(Integer(n*m).digits(2)))     HmPn.append(HH(Integer(n+m).digits(2)))     HN.append(Hn)  X = mean(HN) Y = mean(HmPn) Z = mean(HmXn) n = len(HN) m = n SX2 = variance(HN) SY2 = variance(HmPn) SZ2 = variance(HmXn) SXY2 = ((n-1)*SX2 + (m-1)*SY2)/(n+m-2) SXZ2 = ((n-1)*SX2 + (m-1)*SZ2)/(n+m-2) TXY = sqrt((m*n)/(n+m)).N()*(X-Y)/sqrt(SXY2).N() TXZ = sqrt((m*n)/(n+m)).N()*(X-Z)/sqrt(SXZ2).N() print TXY,TXZ,n+m-2  Output: -1.43265218355297 -32.5323306851490 19998 

The second case (multiplication) increases entropy significantly. The first case ( addition) does not.

Is there a way to give a heuristic explanation why this is so in general (if it is), or is this empirical obervation in general $$1 \le N$$ wrong?

Related: https://physics.stackexchange.com/questions/487780/increase-in-entropy-and-integer-factorization-how-much-work-does-one-have-to-do

## Minimising an Integrated Relative Entropy Functional

Suppose I am given

• A probability distribution on $$\mathbf R^d$$, with density $$\pi (x)$$.
• A family of transition kernels $$\{ q^0 (x \to \cdot) \}_{x \in \mathbf R^d}$$ on $$\mathbf R^d$$, with densities $$q^0 (x \to y)$$.

I now want to find a new family of transition kernels $$\{ q^1 (x \to \cdot) \}_{x \in \mathbf R^d}$$ such that

• $$q^1$$ is in detailed balance with respect to $$\pi$$, i.e.

\begin{align} \pi (x) q^1 ( x \to y ) = \pi (y) q^1 ( y \to x) \end{align}

• $$q^1$$ are as close as possible to $$q^0$$ in KL divergence, averaged across $$x \sim \pi$$.

To this end, I introduce the following functional

\begin{align} \mathbb{F} [ q^1 ] &= \int_{x \in \mathbf R^d} \pi (x) \cdot KL \left( q^1 (x \to \cdot ) || q^0 (x \to \cdot ) \right) dx\ &= \int_{x \in \mathbf R^d} \int_{y \in \mathbf R^d} \pi (x) \cdot q^1 (x \to y) \log \frac{ q^1 (x \to y) }{ q^0 (x \to y) } \, dx dy \ &= \int_{x \in \mathbf R^d} \int_{y \in \mathbf R^d} \pi (x) \cdot \left\{ q^1 (x \to y) \log \frac{ q^1 (x \to y) }{ q^0 (x \to y) } – q^1 (x \to y) + q^0 (x \to y) \right \} \, dx dy. \end{align}

I thus wish to minimise $$\mathbb{F}$$ over all transition kernels $$\{ q^1 (x \to \cdot) \}_{x \in \mathbf R^d}$$ satisfying the desired detailed balance condition. Implicitly, there are also the constraints that the $$q^1$$ are nonnegative and normalised.

I would like to solve this minimisation problem, or at least derive e.g. an Euler-Lagrange equation for it.

Currently, I can show that the first variation of $$\mathbb{F}$$ is given by

\begin{align} \left( \frac{d}{dt} \vert_{t = 0} \right) \mathbb{F} [q^1 + t h] = \int_{x \in \mathbf R^d} \int_{y \in \mathbf R^d} \pi (x) \cdot \log \frac{ q^1 (x \to y) }{ q^0 (x \to y) } \cdot h (x, y) \, dx dy. \end{align}

Moreover, my constraints stipulate that any admissible variation $$h$$ must satisfy the following two conditions:

1. $$\forall x, y, \quad \pi (x) h (x, y) = \pi (y) h (y, x)$$
2. $$\forall x, \quad \int_{y \in \mathbf R^d} h (x, y) dy = 0$$

I have not been able to translate these conditions into an Euler-Lagrange equation. I acknowledge that since the functional involves no derivatives of $$q^1$$, the calculus of variations approach may be ill-suited. If readers are able to recommend alternative approaches, this would also be appreciated. Anything which would allow for a more concrete characterisation of the optimal $$q^1$$ would be ideal.

## Cross Entropy Function

I’ve seen two versions of the cross entropy cost function, and conflicting information about it. $$$$J(\theta) = -\frac{1}{N} \sum_{n=1}^N\sum_{i=1}^C y_{ni}\log \hat{y}_{n_i} (\theta)$$$$ $$C(\theta) = – \frac{1}{N}\sum_{n=1}^N \sum_{i=1}^{C}[ y_{ni}\log (\hat{y}_{ni})+ (1-y_{ni}) \log(1-\hat{y}_{ni})]$$

Some are saying that the second one is equivalent to the first for the case where there are only two classes, which makes sense. But couldn’t the second one also be used for more than two classes? For example, say we have three class classification, with the ground truth $$\vec{y} = \begin{bmatrix} 1 \ 0 \ 0 \end{bmatrix}$$ and $$\vec{\hat{y}} =\begin{bmatrix} 0.7 \ 0.1 \ 0.2 \end{bmatrix}$$. Then with the first equation, cross entropy would simply be $$-\log(0.7)$$. But couldn’t we also use the second equation and calculate cross entropy as $$-\log(0.7) – \log(0.9)-\log(0.8)?$$ When would we use the first equation and when would we use the second one, and why?

## Entropy of moments of integers

Let $$x \in \{0,1\}^n$$ be uniformly at random. What is an estimate for the entropy of moments, $$H(\sum_i x_i, \sum_i i\cdot x_i, \sum_i i^2\cdot x_i)$$ ?

## Determining the entropy of a string if each character has a slight bias

Let’s say I need to generate a 32-character secret comprised of ASCII characters from the set ‘0’..’9′. Here’s one way of doing it:

VALID_CHARS = '0123456789'  generate_secret_string() {     random = get_crypto_random_bytes(32)     secret = ''     for (i = 0; i < 32; i++) {         secret += VALID_CHARS[random[i] % 10]     }     return secret } 

My concern is that my character selection is biased. Because 10 doesn’t divide evenly into 256, the first 6 VALID_CHARS are slightly more likely to occur.

The secret space is 1032, but my generated secrets have less entropy than that. How can I calculate precisely how much entropy I actually have?

## How can ECDSA offer stronger security with the same key length (same amount of entropy)?

It’s always been told that ECDSA is more secure “per bit of key size”, such that it offers same security with a shorter key, or offers stronger security with the same key length.

However, per my understanding, if the length of the key is fixed, it means that the amount of entropy is also fixed (e.g. 160-bit ECDSA key contains no more entropy than 2^160, and a 192-bit AES key contains no more entropy than 2^192). So why is ECDSA considered “more secure” at the same level of entropy?

## entropy of base map of skew product

Let $$(M, \mathcal{B}, \mu)$$ be a probability space and $$f:M\rightarrow M$$ be a continuous. Let $$T:M\rightarrow S^{1}$$ be a measurable function. The $$\mathit{skew\hspace{0.1cm}product}$$ defined by $$T$$ over $$f$$ is the transformation

$$F:M\times S^{1}\rightarrow M\times S^{1}.$$ Let $$\nu$$ is a stationary measure for $$F$$. That means $$\nu(D)=\int\nu(T^{-1}(x)(D))d\mu(x)$$ for $$D\subset S^{1}$$.

We consider $$m=\mu \times \nu$$ is $$F$$-invarinat measure.

Question:

Let $$\nu$$ is non-atomic(i.e. $$\nu(V)=0$$ for $$V\subset S^{1}$$). Can we prove measure theoretic entropy $$h_{\mu}(f)=0$$ for any invarinat measure $$\mu$$?

By disintegration theorem, we can easily see $$m(\{(x, t_{x}), t_{x}\in D\subset S^{1}\})=0$$. So question said, can we prove $$h_{\mu}(f)=0$$ for any invarinat measure $$\mu$$, if graph of function has zero measure?

## Topological entropy of logistic map $f(x) = \mu x (1-x)$, $f:[0,1] \to [0,1]$ for $\mu \in (1,3)$

As stated in the question, I want to find the topological entropy of the logistic map on the interval $$[0,1]$$ for a “nice” range of the parameter $$\mu$$, namely $$\mu \in (1,3)$$. I think the fact that $$f:[0,1] \to [0,1]$$ is a very important additional condition here which simplifies things.

I’ve tried something, which I’m not sure is the right way to approach the problem, but I’ll outline it here anyway.

I know a theorem that states $$h_{top}(f) = h_{top}(f|_{NW(f)})$$, where $$NW(f)$$ is the set of non-wandering points of $$f$$, so I wanted to find that set. By drawing a lot of little pictures, I concluded that for $$x \notin \{0,1\}$$, we should have $$\lim_{n\to \infty} f^{n}(x) = 1-\frac{1}{\mu}$$, which is the other fixed point of $$f$$. Also, the convergence seems fairly straightforward (i.e. it gets closer with every iteration), so I somehow got it into my head that I should have $$NW(f) = \{0, 1- \frac{1}{\mu}$$.

To confirm this, Wikipedia says:

By varying the parameter r, the following behavior is observed:

With r between 0 and 1, the population will eventually die, independent of the initial population. With r between 1 and 2, the population will quickly approach the value r − 1/r, independent of the initial population. With r between 2 and 3, the population will also eventually approach the same value r − 1/r, but first will fluctuate around that value for some time. The rate of convergence is linear, except for r = 3, when it is dramatically slow, less than linear (see Bifurcation memory). 

However, I haven’t been able to find a proof of these claims. Can anyone show me how to prove this, or give me a reference where the proof is clearly written out?

Also, if there is an easier way of finding the topological entropy (again, I emphasize that $$f:[0,1] \to [0,1]$$; I’ve lost a lot of time reading about Mandelbrot sets by conjugating $$f$$ to $$g(z) = z^2 + c$$ and looking at formulas for the entropy of $$g$$ which exist, but with domains $$\mathbb{C}$$ or some variant), I’d be very happy to hear it.

## Entropy of endpoints of a random walk in a dense graph

Let $$p\in[0,1]$$ be a constant and let $$G$$ be a graph with $$n$$ vertices and $$\approx p\binom{n}{2}$$ edges. If you’d like, consider $$p=1/2$$.

Let $$X$$ be a random vertex of $$G$$ chosen proportional to its degree (i.e. the probability that $$X=v$$ is $$d(v)/2|E(G)|$$). That is, $$X$$ is chosen according to the stationary distribution of a random walk. Now, generate two standard random walks starting at $$X$$ denoted $$A_0,A_1,\dots$$ and $$B_0,B_1,\dots$$ where $$A_0=B_0=X$$.

For $$k\geq1$$, define $$\alpha_k$$ so that $$H(A_k,B_k) =\log_2(\alpha_kn^2)$$ where $$H$$ denotes the binary entropy. That is, $$\alpha_k:=2^{H(A_k,B_k)}/n^2$$. The way that entropy works is that $$\alpha_k\approx1$$ if and only if the distribution on $$(A_k,B_k)$$ is roughly uniformly on $$V(G)^2$$.

Question: Given the value of $$\alpha_i$$, what kind of lower bound can one prove on $$\alpha_j$$ for $$j>i$$?

I am mainly interested in constant values of $$i,j$$. For example, I would be quite interested to know whether all such graphs satisfy any (non-trivial) bound of the form $$\alpha_2\geq f(\alpha_1)$$ for some function $$f$$. I’d be very happy to know if there are any existing results related to this; please excuse my lack of knowledge on random walks…