Difficulty understanding the use of arbitrary function for the worst case running time of an algorithm

In CLRS the author said

"Technically, it is an abuse to say that the running time of insertion sort is $ O(n^2)$ , since for a given $ n$ , the actual running time varies, depending on the particular input of size $ n$ . When we say “the running time is $ O(n^2)$ ,” we mean that there is a function $ f(n)$ that is $ O(n^2)$ such that for any value of $ n$ , no matter what particular input of size $ n$ is chosen, the running time on that input is bounded from above by the value $ f(n)$ . Equivalently, we mean that the worst-case running time is $ O(n^2)$ . "

What I have difficulties understanding is why did the author talked about an arbitrary function $ f(n)$ instead of directly $ n^2$ .

I mean why didn’t the author wrote

"When we say “the running time is $ O(n^2)$ ,” we mean that for any value of $ n$ , no matter what particular input of size $ n$ is chosen, the running time on that input is bounded from above by the value $ cn^2$ for some +ve $ c$ and sufficiently large n. Equivalently, we mean that the worst-case running time is $ O(n^2)$ ".

I have very limited understanding of this subject so please forgive me if my question is too basic.

Difficulty in understanding a portion in the proof of the $\text{“white path”}$ theorem as with in CLRS text

I was going through the $ \text{DFS}$ section of the Introduction to Algorithms by Cormen et. al. and I faced difficulty in understanding the $ \Leftarrow$ direction of the proof of the white path theorem. Now the theorem which is the subject of this question depends on two other theorems so I present the dependence before presenting the actual theorem and the difficulty which I face in the said.


Dependencies:

Theorem 22.7 (Parenthesis theorem) In any depth-first search of a (directed or undirected) graph $ G = (V, E)$ , for any two vertices $ u$ and $ v$ ;, exactly one of the following three conditions holds:

  • the intervals $ [d[u], f[u]]$ and $ [d[v], f[v]]$ are entirely disjoint, and neither $ u$ nor $ v$ is a descendant of the other in the depth-first forest,

  • the interval $ [d[u], f[u]]$ is contained entirely within the interval $ [d[v], f[v]]$ , and $ u$ is a descendant of $ v$ ; in a depth-first tree,

  • the interval $ [d[v], f[v]]$ is contained entirely within the interval $ [d[u], f[u]]$ , and $ v$ is a descendant of $ u$ in a depth-first tree.

Corollary 22.8 (Nesting of descendants’ intervals) Vertex $ v$ is a proper descendant of vertex $ u$ in the depth-first forest for a (directed or undirected) graph $ G$ if and only if $ d[u] < d[v] < f[v] < f[u]$ .


Theorem 22.9 (White-path theorem)

In a depth-first forest of a (directed or undirected) graph $ G = (V, E)$ , vertex $ v$ is a descendant of vertex $ u$ if and only if at the time $ d[u]$ that the search discovers $ u$ , vertex $ v$ can be reached from $ u$ along a path consisting entirely of white vertices.

Proof

$ \Rightarrow$ : Assume that $ v$ is a descendant of $ u$ . Let $ w$ be any vertex on the path between $ u$ and $ v$ in the depth-first tree, so that $ w$ is a descendant of $ u$ . By Corollary 22.8, $ d[u] < d[w]$ , and so $ w$ is white at time d[u].

$ \Leftarrow$ :

  1. Suppose that vertex $ v$ is reachable from $ u$ along a path of white vertices at time $ d[u]$ , but $ v$ does not become a descendant of $ u$ in the depth-first tree.
  2. Without loss of generality, assume that every other vertex along the path becomes a descendant of $ u$ . (Otherwise, let $ v$ be the closest vertex to $ u$ along the path that doesn’t become a descendant of $ u$ .)
  3. Let $ w$ be the predecessor of $ v$ in the path, so that $ w$ is a descendant of $ u$ ($ w$ and $ u$ may in fact be the same vertex) and, by Corollary 22.8, $ f[w] \leq f[u]$ .
  4. Note that $ v$ must be discovered after $ u$ is discovered, but before $ w$ is finished.$ ^\dagger$ Therefore, $ d[u] < d[v] < f[w] \leq f[u]$ .
  5. Theorem 22.7 then implies that the interval $ [d[v], f[v]]$ is contained entirely within the interval $ [d[u], f[u]]$ .$ ^{\dagger\dagger}$
  6. By Corollary 22.8, $ v$ must after all be a descendant of $ u$ . $ ^\ddagger$

$ \dagger$ : Now it is clear that since $ u$ is the first vertex to be discovered so any other vertex (including $ v$ ) is discovered after it. In point $ 1$ we assume that $ v$ does not become the decendent of $ u$ , but by the statement that but before $ w$ is finished I feel that this is as a result of exploring the edge $ (w,v)$ (this exploration makes $ v$ ultimately the descendant of $ u$ , so the proof should have ended here $ ^\star$ )

$ \dagger\dagger$ : Considering the exact statement of theorem 22.7 , I do not get which fact leads to the implication in $ 5$ .

$ \ddagger$ : The proof should have ended in the $ \star$ , but why the stretch to this line $ 6$ .

Definitely I am unable to get the meaning the proof of the $ \Leftarrow$ . I hope the authors are using proof by contradiction.

(I thought of an alternate inductive prove. Let vertex $ v$ is reachable from $ u$ along a path of white vertices at time $ d[u]$ . We apply induction on the vertices in the white path. As a base case $ u$ is an improper descendant of itself. Inductive hypothesis, let all vertices from $ u$ to $ w$ be descendants of $ u$ , where $ w$ is the predecessor of $ v$ in the white path. We prove the inductive hypothesis by the exploration of the edge $ (w,v)$ . But I want to understand the proof the text.)

What target numbers would be a certain level of difficulty under this system?

I’m writing a homebrew game system, and I found that I have an action resolution mechanic but not a good system for target numbers (I call them Success Thresholds, or STs, in this game, and from now on I’ll use that term to refer to the minimum number a player gets that can succeed).

To resolve an action, most of the time players roll 2d6 and add a modifier ranging from +0 to +3, depending on the stat. With Advantage, it is (3d6 drop lowest)+mod, and Disadvantage is (3d6 drop highest)+mod.

There are also 4 (well, 5, but one auto succeeds) levels of difficulty. The Trivial tasks are automatically successful. Easy tasks should succeed about 75-80 percent of the time, Moderate tasks should be successful 50-60% of the time, Hard tasks should be successful between 25 and 40 percent of the time, and impossible tasks shouldn’t succeed more than 25% and often more like succeeding below 10-15% or the time even with Advantage and a +3 mod.

This is an anydice program with the base probabilities for a +0 mod. I want to know what number should the ST be for each level of difficulty? I had initially considered 7 as a base difficulty for Moderate tasks, before I added modifiers to rolls.

Difficulty in few steps in proof of “Amortized cost of $\text{Find-Set}$ operation is $\Theta(\alpha(n))$”assuming union by rank, path compression

I was reading the section of data structures for disjoint sets from the text CLRS I faced difficulty in understanding few steps in the proof of the lemma as given in the question title. Here we assume we follow union by rank and path compression heuristics. Before we move into our target lemma a few definitions and lemma is required as a prerequisites for the target lemma.


The prerequisites:

$ $ level(x)=\max\{k:rank[p[x]]\geq A_k(rank[x])\}$ $ $ $ iter(x)=\max\{i:rank[p[x]]\geq A_{level(x)}^{(i)}(rank[x])\}$ $ $ $ \phi_q(x) = \begin{cases} \alpha(n).rank[x] &\quad\text{if $ x$ is a root or $ rank[x]=0$ }\ (\alpha(n)-level(x)).rank[x]-iter(x) &\quad\text{if $ x$ is not a root and $ rank[x]\geq1$ }\ \end{cases}$ $

Lemma 21.9: Let $ x$ be a node that is not a root, and suppose that the $ q$ th operation is either a $ \text{Link}$ or $ \text{Find-Set}$ . Then after the $ q$ th operation, $ \phi_q(х) \leq \phi_{q-1}(х)$ . Moreover, if $ rank[x] \geq 1$ and either $ level(x)$ or $ iter(x)$ changes due to the $ q$ th operation, then $ \phi_q(х) < \phi_{q-1}(х) – 1$ . That is, $ x$ ‘s potential cannot increase, and if it has positive rank and either $ level(x)$ or $ iter(x)$ changes, then $ x$ ‘s potential drops by at least $ 1$ .


Now in the proof below I marks the steps where I face problem

Lemma 21.12: The amortized cost of each $ \text{Find-Set}$ operation is $ \Theta(\alpha(n))$ .

Proof: Suppose that the $ q$ th operation is a $ \text{Find-Set}$ and that the find path contains $ s$ nodes. The actual cost of the $ \text{Find-Set}$ operation is $ O(s)$ . We shall show that no node’s potential increases due to the $ \text{Find-Set}$ and that at least $ \max\{0,s – (\alpha(n) + 2)\}$ nodes on the find path have their potential decrease by at least $ 1$ .

To see that no node’s potential increases, we first appeal to Lemma 21.9 for all nodes other than the root. If $ x$ is the root, then its potential is $ \alpha(n) . rank[x]$ , which does not change.

Now we show that at least $ \max\{0,s – (\alpha(n) + 2)\}$ nodes have their potential decrease by at least $ 1$ . Let $ x$ be a node on the find path such that $ rank[x] > 0$ and $ x$ is followed somewhere on the find path by another node $ у$ that is not a root, where $ level(y) = level(x)$ just before the $ \text{Find-Set}$ operation. (Node $ у$ need not immediately follow $ x$ on the find path.) $ \require{color}\colorbox{yellow}{All but at most $ \alpha(n) + 2$ nodes on the find path satisfy these constraints on $ x$ .}$ $ \require{color}\colorbox{yellow}{Those that do not satisfy them are the firstnode on the find path (if it has rank $ 0$ ),}$ $ \require{color}\colorbox{yellow}{ the last node on the path (i.e., the root), and the last node $ w$ on the path for which}$ $ \require{color}\colorbox{yellow}{ $ level(w) = k$ , for each $ k = 0,1,2,…, \alpha(n) – 1$ .}$

Let us fix such a node $ x$ , and we shall show that $ x$ ‘s potential decreases by at least $ 1$ . Let $ k = level(x) = level(y)$ . Just prior to the path compression caused by the $ \text{Find-Set}$ , we have

$ rank[p[x]] \geq A_k^{(iter(x)}(rank[x])$ (by definition of $ iter(x)$ ) ,

$ rank[p[y]] \geq A_k(rank[y])$ (by definition of $ level(y)$ ,

$ rank[y] > rank[p[x]]$ (by Corollary 21.5 and because $ у$ follows $ x$ on the find path)

Putting these inequalities together and letting $ i$ be the value of $ iter(x)$ before path compression, we have

$ rank[p[y]] \geq A_k(rank[y]) \geq A_k(rank[p[x]])$ (because $ A_k(j)$ is strictly increasing) $ > A_k(A_k^{(iter(x)}(rank[x])) = A_k^{(i+1)}(rank[x])$ .

Because path compression will make $ x$ and $ у$ have the same parent, we know that after path compression, $ rank[p[x]] = rank[p[y]]$ and that the path compression does not decrease $ rank[p[y]]$ . Since $ rank[x]$ does not change, after path compression we have that $ \require{color}\colorbox{pink}{$ rank[p[x]]\geq A_k^{(i+1)}(rank[x])$ . Thus, path compression will cause either $ iter(x)$ to }$ $ \require{color}\colorbox{pink}{increase (to atleast $ i + 1$ ) or $ level(x)$ to increase (which occurs if $ iter(x)$ increases}$ $ \require{color}\colorbox{pink}{to at least $ rank[x] + 1$ ). In either case,by Lemma 21.9, we have $ \phi_q(х) \leq \phi_{q-1}(х) – 1$ .}$ $ \require{color}\colorbox{pink}{Hence, $ x$ ‘s potential decreases by at least $ 1$ .}$

The amortized cost of the $ \text{Find-Set}$ operation is the actual cost plus the change in potential. The actual cost is $ O(s)$ , and we have shown that the total potential decreases by at least $ \max\{0,s – (\alpha(n) + 2)\}$ . The amortized cost, therefore, is at most $ O(s) — (s — (\alpha(n) + 2)) = O(s) — s + 0(\alpha(n)) = O(\alpha(n))$ , since we can scale up the units of potential to dominate the constant hidden in $ О (s)$ . ■


In the proof above I could not get the mathematics behind the statements highlighted in yellow and pink. Can anyone help me out?

Difficulty in understand the proof of the lemma : “Matroids exhibit the optimal-substructure property”

I was going through the text "Introduction to Algorithms" by Cormen et. al. where I came across a lemma in which I could not understand a vital step in the proof. Before going into the lemma I give a brief description of the possible prerequisites for the lemma.


Let $ M=(S,\ell)$ be a matroid where $ S$ is the ground set and $ \ell$ is the family of subsets of $ S$ called the independent subsets of $ S$ .

Let us have an algorithm which finds an optimal subset of $ M$ using greedy method as:

$ GREEDY(M,W):$

$ 1\quad A\leftarrow\phi$

$ 2\quad \text{sort $ S[M]$ into monotonically decreasing order by weight $ w$ }$

$ 3\quad \text{for each $ x\in S[M]$ , taken in monotonically decreasing order by weight $ w(x)$ }$

$ 4\quad\quad \text{do if $ A\cup\{x\} \in \ell[M]$ }$

$ 5\quad\quad\quad\text{then $ A\leftarrow A\cup \{x\}$ }$

$ 6\quad \text{return $ A$ }$


I was having a problem in understanding a step in the proof of the lemma below.

Lemma: (Matroids exhibit the optimal-substructure property)

Let $ x$ be the first element of $ S$ chosen by $ GREEDY$ for the weighted matroid $ M = (S, \ell)$ . The remaining problem of finding a maximum-weight independent subset containing $ x$ reduces to finding a maximum-weight independent subset of the weighted matroid $ M’ = (S’, \ell’)$ , where

$ S’ = \{y\in S:\{x,y\}\in \ell\}$ ,

$ \ell’ = \{В \subseteq S – \{x\} : В \cup \{x\} \in \ell\}$ ,

and the weight function for $ M’$ is the weight function for $ M$ , restricted to $ S’$ . (We call $ M’$ the contraction of $ M$ by the element $ x$ .)

Proof:

  1. If $ A$ is any maximum-weight independent subset of $ M$ containing $ x$ , then $ A’ = A — \{x\}$ is an independent subset of $ M’$ .

  2. Conversely, any independent subsubset $ A’$ of $ M’$ yields an independent subset $ A = A’\cup\{x\}$ of $ M$ .

  3. We have in both cases $ w(A) = w(A’) + w(x)$ .

  4. Since we have in both cases that $ w(A) = w(A’) + w(x)$ , a maximum-weight solution in $ M$ containing $ x$ yields a maximum-weight solution in $ M’$ , and vice versa.


I could understand $ (1),(2),(3)$ . But I could not get how the line $ (4)$ was arrived in the proof from $ (1),(2),(3)$ . especially the part in bold-italics. Could anyone please make it clear to me.

Applying overlapping skills to reduce difficulty in Numenera

I know in Numenera you can apply maximum of 2 difficulty reduction from a skill check. Is there any guidance on applying overlapping skills? Some examples:

  • Character tries to identify plant people. They’re trained at plants and animals (two separate skills), thus reducing difficulty by 2.
  • They try to to dash and jump. They’re trained in athletics and jumping (again, separate skills), -2 difficulty.

Converting difficulty class (DC) of checks from D&D 3E to 5E

I’m looking at converting the 5th-level D&D 3E adventure The Speaker in Dreams to the fifth edition and I’m noticing a lot of DCs from 20 to 30. Were difficulty classes higher in 3E or were players able to consistently roll higher checks because of a specific mechanic compared to 5E? Should I be reducing the DCs of checks when converting, if so is there a good fixed value or formula to decrease the DCs by?

Difficulty Optimizing Postgres Write Throughput

I’m trying to index a collection of roughly 127K files using a PostgreSQL 9.6 server on RDS. I had expected the process of writing these documents to the DB to take about 8 hours, but over time I observe that the write rate decays to 0 so that the process doesn’t complete (at some point, inserts begin timing out). Unfortunately, I don’t have much DBA/PostgreSQL background, so I’m struggling to debug this.

On average, indexing one file means inserting 125 rows and 0.5 MB of data into a table (some files are significant outliers, yielding ~40K rows). I have several indices on the table (I don’t think I can avoid these, due to other requirements).

Initially, the max WAL size on the database was set to 2 GB. My automation could process roughly 25K files before insert performance became unusable, with acceptable write throughput during the first hour. Increasing the max WAL size to 30 GB helped, but didn’t completely solve the problem; with this configuration the system was able to index 85K files before the insert rate degraded. Looking at the database logs, I saw primarily checkpoints that started due to the configured timeout (15 minutes). On the RDS console, I see a fairly consistent average write throughput of 5-15 MB / sec.

Eventually I may need to index a larger corpus of ~635K files, so I’d like to find settings where I get consistent write throughput.

Database Specifications:

  • PostgreSQL 9.6.15 (on RDS)
  • 6 TB disk
  • 4 CPU
  • 16 GB RAM
  • Max WAL size: 30 GB
  • Checkpoint Timeout: 15 min.
  • Checkpoint Completion Target: 0.9

Questions:

  • Do I need to increase the configured max WAL size again? Is there a rule of thumb for how large this should be?
  • Why did a 15x increase in max WAL size only increase the amount of files I could index by 3-4x?
  • Are there other places I should look for diagnostic information?
  • If I stop the write process temporarily and restart it hours later, write performance improves temporarily. Why is this?