If we compare the binary search algorithm (leftmost or insertion point) on Wikipedia:

Algorithm 1:

`function binary_search_leftmost(A, n, T): L := 0 R := n while L < R: m := floor((L + R) / 2) if A[m] < T: L := m + 1 else: R := m return L `

with the one on Rosetta Code:

Algorithm 2:

`BinarySearch_Left(A[0..N-1], value) { low = 0 high = N - 1 while (low <= high) { // invariants: value > A[i] for all i < low value <= A[i] for all i > high mid = (low + high) / 2 if (A[mid] >= value) high = mid - 1 else low = mid + 1 } return low } `

This binary search is the somewhat more complicated type of binary search, which is to find the leftmost index, or it can be the insertion point (so it is not the “exact match” binary search), as follows:

If the numbers in the array are:

`11 22 22 33 44 55 66 `

then if the target is `3`

, then the result should be `0`

(`3`

inserted at index `0`

).

If the target is `59`

, then the result should be `6`

(inserted at where `66`

is), and if the target is `77`

, then the result should be `7`

(inserted to the right of the maximum index, which is adding it as the new, final element).

I found it very easy to establish the invariants and correctness for Algorithm 1:

`function binary_search_leftmost(A, n, T): // Invariant: // [L, R] inclusive is where the answer could be. // so note R is not n - 1 like the "exact match" case, but n, // because we can go "one step to the right" to insert // as index n L := 0 R := n // Invariant: // note that we keep on looping when L < R, meaning the range // [L, R] is not "closed". When L == R, then we have reached // the answer. Note that unlike the exact match case, here we // always have an answer, so when L == R, then we already have // an answer. (because [L, R] is where the answer is, and if // L === R, we already have the answer either L or R and it cannot // be anything else) while L < R: // Invariant: // It might be better to use m := L + floor((R - L) / 2) // here, because of the possible overflow bug. Here we consider // if L and R differ by 1, such as L == 35 and R == 36, then // m becomes min(L, R), and down below, since there can be // only two cases: L := m + 1 which is 36, or R = m which is 35, // that means the range [L, R] always shrinks. If L and R // differ by 2 or more, then it is obvious that [L, R] will // always shrink. When L and R differ by 0, the loop will not // continue. So here we established that there will be no // infinite loop m := floor((L + R) / 2) // if target is, say 55, and A[m] is 25 or 54, now we are // disqualifying 25...A[m], so we set the lower bound L // to m + 1 (so we disqualify m as well) if A[m] < T: L := m + 1 // Here it is T <= A[m] // if target is, say 55, and A[m] is 55, 56, 100, or 1000 // now our answer could still be m, because m could be the // leftmost (we don't know yet), so we don't disqualify m // and so we disqualify m + 1 and all the way to the end // of array, so we set R = m else: R := m // Finally, when L == R, we could return either L or R // In fact, it may be good to assert L == R here return L `

However, for Algorithm 2, I found it quite difficult to establish invariants. It will be kind of like [low, high + 1] could be where the answer falls into. And if I imagine a `high2 === high + 1`

, then maybe everything can fall into place, with

`high = N - 1 high2 = high + 1 = N - 1 + 1 = N (same as Algorithm 1) while (low <= high) while (low < high + 1) (same as Algorithm 1, see below) while (low < high2) (same as Algorithm 1) high = mid - 1 high2 = high + 1 = mid - 1 + 1 high2 = mid (same as Algorithm 1) `

but the line

`mid = (low + high) / 2 `

becomes

`mid = (low + high2 - 1) / 2 `

(I think it is assuming integer arithmetics) so then it is shifting the answer to 0.5 to the left sometimes (and therefore `1`

less if taking the floor). But if we look at [low, high] === [35, 37] or [35, 36] or [35, 35], they still work out well. Overall, the invariants seem somewhat awkward and unclear. Are there actually some ways to establish good invariants and correctness for Algorithm 2?