information theory, find entropy given Markov chain

There is an information source on the information source alphabet $ A = \{a, b, c\}$ represented by the state transition diagram below:

Markov chain

a) The random variable representing the $ i$ -th output from this information source is represented by $ X_i$ . It is known that the user is now in state $ S_1$ . In this state, let $ H (X_i|s_1)$ denote the entropy when observing the next symbol $ X_i$ , find the value of $ H (X_i|s_1)$ , entropy of this information source, Calculate $ H (X_i|X_{i-1}) $ and $ H (X_i)$ respectively. Assume $ i$ is quite large

How can I find $ H(X_i|s_1)?$ I know that $ $ H(X_i|s_1) = -\sum_{i,s_1} p\left(x_i, s_1\right)\cdot\log_b\!\left(p\left(x_i|s_1\right)\right) = -\sum_{i,j} p\left(x_i, s_1\right)\cdot\log_b\!\left(\frac{p\left(x_i, s_1\right)}{p\left(s_1\right)}\right)$ $ but I don’t know $ p(s_1)$ .

$ $ A=\begin{pmatrix}0.25 & 0.75 & 0\0.5 & 0 & 0.5 \0 & 0.7 & 0.3 \end{pmatrix}.$ $

From matrix I can know that $ p(s_1|s_1)=0.25$ , etc.

But what is the probability of $ s_1$ ? And how can I calculate $ H (X_i|X_{i-1})$ ?

Calculating spatial “orderedness” at each position in a binary image: entropy? lacunarity?

Let’s say I have image where all pixel values are either 0 or 1. What I’d like to do is to be able to generate a new image with the same dimensions where each pixel represents how “ordered” the area around the corresponding pixel in the original image is. In particular, I’m looking for “spatial” order: whether or not there is some regularity or pattern in that local area. This could then be used to segment in image into regions of relative order and regions of relative disorder.

For example:

ordered1 and ordered2

are both highly ordered. On the other hand, disordered

probably has varying levels of order within the image but is overall disorded. Finally, an image like


has areas of order (bottom left and to some extent top right) and disorder (rest of the image).

I’ve considered taking some general measure of entropy (like Shannon’s image entropy) and applying it with a moving window across the image, but my understanding is that most measures of entropy do not capture much about the spatial aspects of the image. I’ve also come across the concept of “lacunarity” which looks promising (it’s been used to segment e.g., anthropogenic structures from natural landscapes on the basis of homogeneity) but I’m having a hard time understanding how it works and thus if it’s truly appropriate. Could either of these concepts be made to work for what I’m asking, or is there something else I haven’t considered?

Is sha256 a good function to derive keys from a secret of sufficient length and entropy?

Assuming I have a secret key of sufficient length and entropy (I get to decide the length and have a good random source).

I would like to generate 256 length keys by hashing the root key with the name of each key, ex:

key1 = sha256(rootKey +"key1")  key2 = sha256(rootKey +"key2")  ... keyN = sha256(rootKey +"keyN")  

Is the sha256 hash a good choice ?

If yes, what length should the root secret be ? I’m thinking 256 bit is pretty good, but it wouldn’t cost much to make it bigger…

How to reduce entropy?

This is not necessarily a research question, since I do not know, if someone is working on this or not, but I hope to gain some insight by asking it here:

The idea behind this question is to attach to a natural number in a “natural” way an entropy, such that “multiplication increases entropy”. (Of course one can attach very different entropies to natural numbers such that multiplication reduces entropy, but I will try to give an argument, why this choice is natural.) Let $ n$ be a composite number $ \ge 2$ , $ \phi$ the Euler totient function. Suppose a factorization algorithm $ A$ outputs a number $ X$ , $ 1 \le X \le n-1$ with equal probability $ \frac{1}{n-1-\phi(n)}$ such that $ 1 < \gcd(X,n) < n$ . Then we can view $ X$ as a random variable and attach the entropy to it $ H(X_n):=H(n):= \log_2(n-1-\phi(n))$ .

The motivation behind this definition comes from an analogy to physics: I think that “one-way-functions” correspond to the “arrow of time” in physics. Since “the arrow of time” increases entropy, so should “one-way-functions”, if they exist. Since integer factorization is known to be a candidate for owf, my idea was to attach an entropy which would increase when multiplying two numbers.

It is proved here ( ) that:

$ H(mn) > H(n) + H(m)$ for all composite numbers $ n \ge 2, m \ge 2$

The entropy of $ n=pq$ , $ p<q<2p$ will be $ H(pq) = \log_2(p+q-1) > \log_2(2p-1) \approx \log_2(2p) = 1+\log_2(p)$ .

At the beginning of the factorization, the entropy of $ X_n$ , $ n=pq$ will be $ \log_2(p+q-1)$ since it is “unclear” which $ X=x$ will be printed. The algorithm must output $ X=x$ as described above. But knowing the value of $ X$ , this will reduce the entropy of $ X$ to $ 0$ . So the algorithm must reduce the entropy from $ \log_2(p+q-1)$ to $ 0$ . From physics it is known, that reducing entropy can be done with work. Hence the algorithm “must do some work” to reduce the entropy.

My question is, which functions do reduce entropy? (So I am thinking about how many function calls will the algorithm at least make to reduce the entropy by the amount described above?)

Thanks for your help!

Does multiplication increase entropy?

Does multiplication increase entropy?

The Shannon entropy of a number $ k$ in binary digits is defined as $ $ H = -\log(\frac{a}{l})\cdot\frac{a}{l} – \log(1-\frac{a}{l})\cdot (1-\frac{a}{l})$ $ where $ l = \text{ floor }(\frac{\log(k)}{\log(2)})$ is the number of binary digits of $ k$ and $ a$ is the number of $ 1$ -s in the binary expansion of $ k$ . So we view the number $ k$ as a “random variable”.

Suppose that $ n,m$ are uniformly randomly chosen in the interval $ 1 \le N$ .

Hypothesis 1):

$ H_{m \cdot n}$ is “significantly” larger then $ H_n$ .

Hypothesis 2):

$ H_{m + n}$ is not “significantly” larger then $ H_n$ .

Here is some empirical statistical test indicating that multiplication increases entropy, but addition does not:

def entropyOfCounter(c):     S = 0     for k in c.keys():         S += c[k]     prob = []     for k in c.keys():         prob.append(c[k]/S)     H = -sum([ p*log(p,2) for p in prob]).N()     return H  def HH(l):     return entropyOfCounter(Counter(l))  N  = 10^4 HN = [] HmXn = [] HmPn = [] for k in range(N):     n = randint(1,17^50)     m = randint(1,17^50)     Hn = HH(Integer(n).digits(2))     Hm = HH(Integer(m).digits(2))     HmXn.append(HH(Integer(n*m).digits(2)))     HmPn.append(HH(Integer(n+m).digits(2)))     HN.append(Hn)  X = mean(HN) Y = mean(HmPn) Z = mean(HmXn) n = len(HN) m = n SX2 = variance(HN) SY2 = variance(HmPn) SZ2 = variance(HmXn) SXY2 = ((n-1)*SX2 + (m-1)*SY2)/(n+m-2) SXZ2 = ((n-1)*SX2 + (m-1)*SZ2)/(n+m-2) TXY = sqrt((m*n)/(n+m)).N()*(X-Y)/sqrt(SXY2).N() TXZ = sqrt((m*n)/(n+m)).N()*(X-Z)/sqrt(SXZ2).N() print TXY,TXZ,n+m-2  Output: -1.43265218355297 -32.5323306851490 19998 

The second case (multiplication) increases entropy significantly. The first case ( addition) does not.

Is there a way to give a heuristic explanation why this is so in general (if it is), or is this empirical obervation in general $ 1 \le N$ wrong?


Minimising an Integrated Relative Entropy Functional

Suppose I am given

  • A probability distribution on $ \mathbf R^d$ , with density $ \pi (x)$ .
  • A family of transition kernels $ \{ q^0 (x \to \cdot) \}_{x \in \mathbf R^d}$ on $ \mathbf R^d$ , with densities $ q^0 (x \to y)$ .

I now want to find a new family of transition kernels $ \{ q^1 (x \to \cdot) \}_{x \in \mathbf R^d}$ such that

  • $ q^1$ is in detailed balance with respect to $ \pi$ , i.e.

\begin{align} \pi (x) q^1 ( x \to y ) = \pi (y) q^1 ( y \to x) \end{align}

  • $ q^1$ are as close as possible to $ q^0$ in KL divergence, averaged across $ x \sim \pi$ .

To this end, I introduce the following functional

\begin{align} \mathbb{F} [ q^1 ] &= \int_{x \in \mathbf R^d} \pi (x) \cdot KL \left( q^1 (x \to \cdot ) || q^0 (x \to \cdot ) \right) dx\ &= \int_{x \in \mathbf R^d} \int_{y \in \mathbf R^d} \pi (x) \cdot q^1 (x \to y) \log \frac{ q^1 (x \to y) }{ q^0 (x \to y) } \, dx dy \ &= \int_{x \in \mathbf R^d} \int_{y \in \mathbf R^d} \pi (x) \cdot \left\{ q^1 (x \to y) \log \frac{ q^1 (x \to y) }{ q^0 (x \to y) } – q^1 (x \to y) + q^0 (x \to y) \right \} \, dx dy. \end{align}

I thus wish to minimise $ \mathbb{F}$ over all transition kernels $ \{ q^1 (x \to \cdot) \}_{x \in \mathbf R^d}$ satisfying the desired detailed balance condition. Implicitly, there are also the constraints that the $ q^1$ are nonnegative and normalised.

I would like to solve this minimisation problem, or at least derive e.g. an Euler-Lagrange equation for it.

Currently, I can show that the first variation of $ \mathbb{F}$ is given by

\begin{align} \left( \frac{d}{dt} \vert_{t = 0} \right) \mathbb{F} [q^1 + t h] = \int_{x \in \mathbf R^d} \int_{y \in \mathbf R^d} \pi (x) \cdot \log \frac{ q^1 (x \to y) }{ q^0 (x \to y) } \cdot h (x, y) \, dx dy. \end{align}

Moreover, my constraints stipulate that any admissible variation $ h$ must satisfy the following two conditions:

  1. $ \forall x, y, \quad \pi (x) h (x, y) = \pi (y) h (y, x)$
  2. $ \forall x, \quad \int_{y \in \mathbf R^d} h (x, y) dy = 0$

I have not been able to translate these conditions into an Euler-Lagrange equation. I acknowledge that since the functional involves no derivatives of $ q^1$ , the calculus of variations approach may be ill-suited. If readers are able to recommend alternative approaches, this would also be appreciated. Anything which would allow for a more concrete characterisation of the optimal $ q^1$ would be ideal.

Cross Entropy Function

I’ve seen two versions of the cross entropy cost function, and conflicting information about it. \begin{equation}J(\theta) = -\frac{1}{N} \sum_{n=1}^N\sum_{i=1}^C y_{ni}\log \hat{y}_{n_i} (\theta)\end{equation} $ $ C(\theta) = – \frac{1}{N}\sum_{n=1}^N \sum_{i=1}^{C}[ y_{ni}\log (\hat{y}_{ni})+ (1-y_{ni}) \log(1-\hat{y}_{ni})] $ $

Some are saying that the second one is equivalent to the first for the case where there are only two classes, which makes sense. But couldn’t the second one also be used for more than two classes? For example, say we have three class classification, with the ground truth $ \vec{y} = \begin{bmatrix} 1 \ 0 \ 0 \end{bmatrix}$ and $ \vec{\hat{y}} =\begin{bmatrix} 0.7 \ 0.1 \ 0.2 \end{bmatrix} $ . Then with the first equation, cross entropy would simply be $ -\log(0.7)$ . But couldn’t we also use the second equation and calculate cross entropy as $ -\log(0.7) – \log(0.9)-\log(0.8)?$ When would we use the first equation and when would we use the second one, and why?

Determining the entropy of a string if each character has a slight bias

Let’s say I need to generate a 32-character secret comprised of ASCII characters from the set ‘0’..’9′. Here’s one way of doing it:

VALID_CHARS = '0123456789'  generate_secret_string() {     random = get_crypto_random_bytes(32)     secret = ''     for (i = 0; i < 32; i++) {         secret += VALID_CHARS[random[i] % 10]     }     return secret } 

My concern is that my character selection is biased. Because 10 doesn’t divide evenly into 256, the first 6 VALID_CHARS are slightly more likely to occur.

The secret space is 1032, but my generated secrets have less entropy than that. How can I calculate precisely how much entropy I actually have?