Average Case Analysis of Insertion Sort as dealt in Kenneth Rosen’s “Discrete Mathemathematics and its Application”

I was going through “Discrete Mathematics and its Application” by Kenneth Rosen where I came across the following algorithm of the Insertion Sort and also its analysis. The algorithm is quite different from the one dealt with in the CLRS so I have shared the entire algorithm below. Note that they have considered a machine where only comparisons are considered are significant and hence have proceeded according. The problem which I face is in the analysis portion given here in bold. Moreover the specific doubts which I have , have been pointed out by me at the very end of this question.

ALGORITHM The Insertion Sort.


procedure insertion sort($ a_1,a_2,…,a_n$ : real numbers with $ n \geqslant 2 $ )

for j:= 2 to n begin     i:=1     while aj > ai         i:=i+1     m := aj     for k:= 0 to j-i-1         aj-k := aj-k-1      ai:=m end {a1,a2,...,an is sorted}  

THE INSERTION SORT: The insertion sort is a simple sorting algorithm, but it is usually not the most efficient. To sort a list with $ n$ elements, the insertion sort begins with the second element. The insertion sort compares this second element with the first element and inserts it before the first element if it does not exceed the first element and after the first element if it exceeds the first element. At this point, the first two elements are in the correct order. The third element is then compared with the first element, and if it is larger than the first element, it is compared with the second element; it is inserted into the correct position among the first three elements.

In general, in the $ y$ th step of the insertion sort, the $ y$ th element of the list is inserted into the correct position in the list of the previously sorted $ j — 1$ elements. To insert the $ y$ th element in the list, a linear search technique is used; the $ y$ th element is successively compared with the already sorted $ j — 1$ elements at the start of the list until the first element that is not less than this element is found or until it has been compared with all $ j — 1$ elements; the $ y$ th element is inserted in the correct position so that the first $ j$ elements are sorted. The algorithm continues until the last element is placed in the correct position relative to the already sorted list of the first $ n — 1$ elements. The insertion sort is described in pseudocode in Algorithm above.

Average-Case Complexity of the Insertion Sort: What is the average number of comparisons used by the insertion sort to sort $ n$ distinct elements?

Solution: We first suppose that $ X$ is the random variable equal to the number of comparisons used by the insertion sort to sort a list $ a_1 ,a_2 ,…,a_n$ of $ n$ distinct elements. Then $ E(X)$ is the average number of comparisons used. (Recall that at step $ i$ for $ i = 2,…,n$ , the insertion sort inserts the $ i$ th element in the original list into the correct position in the sorted list of the first $ i − 1$ elements of the original list.)

We let $ X_i$ be the random variable equal to the number of comparisons used to insert $ a_i$ into the proper position after the first $ i − 1$ elements $ a_1 ,a_2,…,a_{i−1}$ have been sorted. Because

$ X=X_2+X_3+···+X_n$ ,

we can use the linearity of expectations to conclude that

$ E(X) = E(X_2 + X_3 +···+X_n) = E(X_2) + E(X_3) +···+E(X_n).$

To find $ E(X_i )$ for $ i = 2, 3,…,n$ , let $ p_j (k)$ denote the probability that the largest of the first $ j$ elements in the list occurs at the $ k$ th position, that is, that $ max(a_1 ,a_2 ,…,a_j ) = a_k$ , where $ 1 ≤ k ≤ j$ . Because the elements of the list are randomly distributed, it is equally likely for the largest element among the first $ j$ elements to occur at any position. Consequently, $ p_j (k) = \frac{1}{j}$ .If $ X_i (k)$ equals the number of comparisons used by the insertion sort if $ a_i$ is inserted into the $ k$ th position in the list once $ a_1,a_2 ,…,a_{i−1}$ have been sorted, it follows that $ X_i (k) = k$ . Because it is possible that $ a_i$ is inserted in any of the first $ i$ positions, we find that

$ E(X_i)$ = $ $ \sum_{k=1}^{i} p_i(k).X_i(k) = \sum_{k=1}^{i} \frac{1}{i}.k = \frac {1}{i}\sum_{k=1}^{i} k = \frac{1}{i}.\frac{i(i+1)}{2} = \frac{i+1}{2}$ $

It follows that

$ E(X)$ = $ $ \sum_{i=2}^{n} E(X_i) = \sum_{i=2}^{n} \frac{i+1}{2} =\frac{n^{2} + 3n -4}{4}$ $

My doubt


Now here while we are considering the calculation of $ E(X_i)$ we are first considering the probability of the maximum element between $ a_1,a_2,…,a_i$ being at position $ k$ . Then they are saying that the number of comparisons when $ a_i$ is placed into the $ k$ th position in the list $ a_1,a_2,…,a_{i-1}$ (which is already sorted) is $ k$ . Why are they considering the insertion of $ a_i$ into the position of the maximum of the elements $ a_1,a_2,…,a_i$ . $ a_i$ as per the algorithm should be placed at the first position (while scanning the array from left) when we find an element which is $ \geqslant a_i$ and not the maximum element of the sublist $ a_1,a_2,…,a_i$ .

Moveover they say that the max element of the sublist $ a_1,a_2,…,a_i$ is any arbitrary position $ k$ th and the probability of it being $ \frac{1}{i}$ . But if we see that $ a_1,a_2,…,a_{i-1}$ is sorted then the max of $ a_1,a_2,…,a_i$ is either $ a_{i-1}$ or $ a_i$ .

Computing average access time

A computer has a cache memory and a main memory with the following features: 

– Memory cache access time: 4 ns – Main memory access time: 80 ns – The time needed to load a line in cache is 120 ns. – Write-through policy. If the hit ratio in this computer is 95 % and the 90% of the memory accesses are read operations, what is the average access time?

It seems a bit weird to me since I have read through this site calculate the effective (average) access time (E AT) of this system but I still do not know how to do it. I would appreciate any help. Thanks!

Average access time in two level cache system

In a two-level cache system, the level one cache has a hit time of 1 ns (inside the CPU), hit rate of 90%, and a miss penalty of 20 ns. The level two cache has a hit rate of 95% and a miss penalty of 220 ns. What is the average memory access time?

What is a two-level cache system and how to calculate the time required? Since the hit time of level two cache is missing…

Why should an average programmer learn automata theory?

Good programming relies heavily on choosing an efficient algorithm for the task at hand and yet an average programmer hardly uses 50 pages worth of algorithms from the Cormen book in his/her career.

Recently I started reading some CS books, long after completing my bachelors degree. One of the books is the Theory of Computation by Micheal Sipser. Although I love the content and still am in the beginning chapters, I cannot imagine where I would use the information provided in this book in my job.

What is the use of Automata Theory in the industry?

Does Sacred Flame or Firebolt do more damage on average to a Shadow over an arbitrary but finite number of rounds?

Assume the following character build:

Level 2 Variant Human Wild Magic Sorcerer with Magic Initiate Feat; relevant spells are Sacred Flame and Fire Bolt. While I don’t think character level is relevant, it is included in case an ability plays a larger roll than anticipated.

Ability Scores:
STR: 8 (-1)
DEX: 14 (+2)
CON: 14 (+2)
INT: 10 (+0)
WIS: 12 (+1)
CHA: 14 (+2)

Spell Save DC: 8 + 2 + 2 = 12
Spell Attack Modifier: 2 + 2 = 4

Spell Save DC for Sacred Flame: 8 + 2 + 1 = 11

Magic Initiate (5e PHB page 168):

Choose a class{cleric}: You learn two cantrips of your choice from that class’s spell list. … Your spellcasting ability for these spells depends on the class you chose: … Wisdom for cleric

Sacred Flame (5e PHB page 272):

Cantrip.
Casting Time: 1 action.
Flame-like radiance descends on a creature that you can see within range. The target must succeed on a Dexterity saving throw or take 1d8 radiant damage. The target gains no benefit from cover for this saving throw.

Firebolt (5e PHB page 242):

Cantrip.
Casting Time: 1 action
You hurl a mote of fire at a creature or object within range. Make a ranged spell attack against the target. On a hit, the target takes 1d10 fire damage. A flammable object hit by this spell ignites if it isn’t being worn or carried.

Shadow (53 MM page 269),

Armor Class 12
Hit Points 16 (3d8 + 3)
DEX: 14 (+2)
Damage Vulnerabilities radiant
Damage Resistances … fire …

Damage Resistance and Vulnerability (53 PHB page 197)

If a creature or an object has resistance to a damage type, damage of that type is halved against it. If a creature or an object has vulnerability to a damage type, damage of that type is doubled against it.

Assuming that we don’t have to worry about the Sorcerer taking any damage (just for the sake of the question, otherwise it boils down to the Sorcerer dying after a few rounds), which spell would be the better choice to deal the maximum amount of damage on average within an arbitrary but finite amount of rounds?

Basing average case hardness from worst case hardness

I am trying to understand the current knowledge about questions of the form: “If a language $ L$ is difficult on the worse case, then there exists a distribution $ D$ such that it is difficult to solve $ L$ on instances drawn from $ D$ “.

  1. Random self reducibility captures this concept: it asks if it is possible to reduce the problem of solving arbitrary instance $ x \in L$ to solving a set of random instances. In AGGM it is shown that unless the polynomial time hierarchy collapses, there is no $ L \in NPC$ which is randomized self reducible. It means that it is unlikely that we can prove that some $ L$ is hard on average assuming only that $ L$ is hard in worst case.
  2. For circuits, seems that the situation is a bit different: Impaglizaao’s hardcore lemma (informally) states that if $ L$ cannot be solved on a fraction of $ 1-\delta$ of the inputs of size $ n$ by a circuit of size $ S$ , then there exists a distribution such that no circuit of roughly the same size can solve more than $ 0.5 + \epsilon$ of the inputs. Thus, if we had a worse case hard language $ L$ , we could build a “difficult distribution”.

First, I don’t understand how these results settle: unless the polynomial hierarchy collapses, $ SAT \notin P/poly$ , which means that every polynomial size circuit $ C_n$ can solve at most fraction of $ 1-\delta$ of the inputs, so by Impaglizaao’s hardcore lemma it is possible to construct a distribution over $ SAT$ which is hard for all polynomial circuits – assuming only worst case assumptions.

Second, are there known results about hardcore sets for uniform computation (i.e. Turing machines)?