Question about the lower bound for $k \times k$-clique (under ETH) shown in “Slightly Superexponential Parameterized Problems”

I am reading the paper Slightly Superexponential Parameterized Problems at the moment and have two questions about it:

First question: The paper gives a proof of the following statement

Theorem 2.1: Assuming ETH, there is no $ 2^{o(klogk)}$ time algorithm for $ k \times k$ -Clique.

They prove this statement by sophisticated reduction from $ 3$ -coloring. They then state that this construction runs in time polynomial in $ k$ . They state:

The graph $ G$ has $ k^2$ vertices and the time required to construct G is polynomial in $ k$ . […] Therefore, the total running time is $ 2^{o(k \log(k)} \cdot k^{O(1)}$

Why is this not $ 2^{o(k \log(k)}) + k^{O(1)}$ instead? As far as I can see, we have only have to construct the graph once, and then we can run the presumed $ 2^{o(k \log(k)}$ algorithm.

Second question: From this theorem, it follows that $ k \times k$ -Clique can not have a $ k^{o(k)}$ algorithm under the Exponential Time Hypothesis. This follows from the abstract, which states that $ k^{O(k)} = 2^{O(k \log(k))}$ . What is a good way to proof this statement?

Multithreaded parameterized programs with superexponential shared memory size in the number of threads?

In this question, a program means a parameterized multithreaded program with the interleaving semantics, a finite number of per-thread states (which may depend on the number of threads), and finite number of shared states (which may depend on the number of threads).

A shared state is a valuation of the shared memory, and a per-thread (in other terminology, local) state is the valuation of thread-local memory (we assume no stack). Interleaving semantics means that the actions of the threads are interleaved on a single processor and a thread has rw-access to the shared memory and its own local memory and no access to the local memories of the other threads. Parameterized means that we conside a family of programs generated from a finite-desciption template such that the $ n$ th member of the family has $ n$ threads (which typically coincide up to the thread identifier).

To the best of my knowledge, for such a program, the size of the shared state-space is anywhere between constant (e.g., for a single boolean lock variable) and exponential (e.g., Peterson mutual exclusion protocol) in the number of the threads $ n$ .

Is there any well-known academic program in which the size of the shared state-space grows superexponentially in $ n$ ?