Word factorization in $O(n^2 \log n)$ time

Given two strings $ S_1, S_2$ , we write $ S_1S_2$ for their concatenation. Given a string $ S$ and integer $ k\geq 1$ , we write $ (S)^k = SS\cdots S$ for the concatenation of $ k$ copies of $ S$ . Now given a string, we can use this notation to ‘compress’ it, i.e. $ AABAAB$ may be written as $ ((A)^2 B)^2$ . Let’s call the weight of a compression the number of characters appearing in it, so the weight of $ ((A)^2 B^2)$ is two, and the weight of $ (AB)^2 A$ (a compression of $ ABABA$ ) is three (separate $ A$ s are counted separately).

Now consider the problem of computing the ‘lightest’ compression of a given string $ S$ with $ |S|=n$ . After some thinking there is an obvious dynamic programming approach which runs in $ O(n^3 \log n)$ or $ O(n^3)$ depending on the exact approach.

However, I have been told this problem can be solved in $ O(n^2 \log n)$ time, though I cannot find any sources on how to do this. Specifically, this problem was given in a recent programming contest (problem K here, last two pages). During the analysis an $ O(n^3 \log n)$ algorithm was presented, and at the end the pseudo quadratic bound was mentioned (here at the four minute mark). Sadly the presenter only referred to ‘a complicated word combinatorics lemma’, so now I have come here to ask for the solution πŸ™‚

Differentiability of $\int \log F$ when $\int \log f$ is differentiable?

For a specific probability density function $ f$ , which is not differentiable everywhere, I have proven that the Hessian of $ $ g(\theta) = \int \log f(x;\theta)d H(x),$ $ exists for all $ \theta \in {\mathbb R}^p$ , where $ H$ is another distribution function. Let $ F(x;\theta) = \int_{-\infty}^x f(t;\theta)dt$ be the distribution function. I want to check if the Hessian of $ $ G(\theta) = \int \log F(x;\theta)dH(x),$ $ also exists.

Is there a direct method of showing this? This is, some general result I can appeal to? If it wasn’t for the logarithm, I could use exchange the integral and derivative symbols, for instance.

Bin packing first-fit problem in $O(n \log n)$ time

Suppose we have $ n$ objects with weights $ w_i \in (0,1]$ and we must insert them into bins with the constraint that every bin must contain objects which weight less than $ 1 \, kg$ .

The first-fit algorithm must: examine the objects with the order they’re given $ (w_1, w_2, \dots , w_n)$ and insert them into the bins, satisfying the above constraint. It must run in $ O(n \log n)$ time, returning the number of bins which were used and which objects were inserted in each bin.

I’ve constructed an algorithm, using Hash Tables, where the objects’ numbers are the keys and the bins’ numbers are the values and it basically does this:

for(int i = 0; i < n; i++){     sum = sum + w[i];      if(sum < noOfBins){         BinContent.put(i + 1, noOfBins);     }      else {         noOfBins++;         BinContent.put(i + 1, noOfBins);     } 

and then it prints noOfBins and BinContent, but I believe this algorithm is $ O(n)$ and the proper way would be to use Binary Trees instead of Hash Tables.

Is the complexity of the given algorithm $ O(n)$ ? If so, could you give me a hint on how to achieve $ O(n \log n)$ time complexity?

Are the digits of $\log 2$ and $\log 3$, or $\sqrt{2}$ and $\sqrt{3}$, or $e$ and $\pi$, cross-correlated?

I try to find sequences of digits (in base $ b$ , with $ b$ not necessarily an integer) that are not cross-correlated. While the digits in base $ b$ of (say) $ \pi$ and $ e$ do not exhibit cross-correlations when taken separately (assuming $ b$ is an integer) since these numbers are believed to be normal numbers, what about cross-correlations between these two sequences of digits?

The context is a business application: a generic number guessing game played with real money. If I use sequences that are cross-correlated, the player can leverage this fact (if she discovers the auto-correlations) to increase her odds of winning, making the game unfair to the operator. In short, I could lose money. For details, see section 4 in my article Some Fun with Gentle Chaos, the Golden Ratio, and Stochastic Number Theory.

Why does $O(n \log n)$ seem so linear?

I’ve implemented an algorithm, that when analyzed should be running with the time complexity of $ O(n \log n)$ .

However when plotting the computational time against the cardinality of the input set, it seems somewhat linear and computing $ R^2$ confirms this somewhat. When then sanity checking myself by plotting $ n$ on the $ x$ -axis and $ n \log_2 n$ on the $ y$ -axis with python, and plotting this it also seemed linear. Computing $ R^2$ (scipy.stats.linregress) further confuses me, as I get $ R^2=0.9995811978450471$ when my $ x$ and $ y$ data is created as so:

for n in range(2, 10000000):     x.append(n)     y.append(n * math.log2(n)) 

Am I missing something fundamental? Am I using too few iterations for it to matter? When looking at the graph at http://bigocheatsheet.com/ it does not seem linear at all.

Real exponential field with restricted analytic functions: $\mathbb R_{an, exp, log}$ has quantifier elimination, but $\mathbb R_{an, exp}$ does not.

At a talk sometime ago a result was presented, which I believe originates from:

van den Dries, Lou; Miller, Chris, On the real exponential field with restricted analytic functions, Isr. J. Math. 85, No. 1-3, 19-56 (1994). ZBL0823.03017.

At some point it was mentioned that $ \mathbb R_{an,exp,log}$ admits quantifier elimination while $ \mathbb R_{an,exp}$ does not. Here $ \mathbb R_{an,exp}$ is the theory of the (ordered) real exponential field with function symbols for all restricted analytic functions. Then of course $ \mathbb R_{an,exp,log}$ is just adding a function symbol for logarithms.

Someone in the audience remarked that $ log(x)$ (or more precisely, its graph) is quantifier-free definable by $ x = exp(y)$ . Then a fairly simple formula was presented to show why you really need $ log$ as a function symbol for quantifier elimination, and there is my question: I just cannot remember or reconstruct that formula. So what would be a simple example of some formula in this setting that is not equivalent to a quantifier-free formula in $ \mathbb R_{an,exp}$ ?

I am probably missing something obvious here, but now it’s haunting me.