Algorithms

Imagine you have a ball that bounces. Each time it bounces, it reaches 1/2 the height of the previous bounce.

So the total distance the ball travels is 2 meters, even though it bounces infinitely! What this means is that 2 is the limit and that even though it bounces infinitely it will never go past 2.

If r were 1 or greater, the terms would not get smaller, and the series would not converge to a finite sum. The formula only works when the series eventually approaches a limit.

r > 1 (Divergent series)

When r is greater than 1, the geometric series behaves very differently: 1. Each term gets larger than the previous one. 2. The series keeps growing without limit. 3. There is no finite sum for an infinite series in this case.

Another example is imagine a rumor spreading: - On day 1, 1 person knows the rumor. - On day 2, 3 people know (it tripled). - On day 3, 9 people know (tripled again). - On day 4, 27 people know (tripled again).

Key differences from r < 1 1. No Convergence: The series doesn’t approach a fixed value. 2. No Finite Sum: For an infinite series, there’s no formula to calculate a finite sum. 3. Rapid Growth: The terms get bigger very quickly.

While we can’t sum an infinite series when r > 1, we can sum a finite number of terms:

When r > 1, the series grows without bound. It’s useful for modeling exponential growth scenarios, but you can’t calculate a finite sum for an infinite number of terms.

Proof for r > 1 formula

Let’s break down why the middle part formula \((r^{k+1} - 1) / (r - 1)\) works for the sum of a geometric series:

And that’s the formula. It works because of the clever way the terms cancel out when you subtract. This method is called the “subtract and shift” technique.

Karatsuba

Integer Multiplication

x = 5678, y = 1234 (so n = 4) partial product: 5678 * 4 = 22712 5678 * 3 = 17034 5678 * 2 etc…

We add a 0 at the end of each partial product then add up all of them. This has n columns and n rows in the calculation

Number of operations

5678 * 4

In general, computing a singular partial product involves n multiplications (one per digit) and at most n additions (sum and carry), for a total of at most 2n primitive operations.

the rest

Since there are n partial products, computing all of them requires at most n * 2n = 2n^2 operations.

Karatsuba Multiplication

Step 1 Compute ac which is 672 Step 2 Compute bd which is 2652 Step 3 Compute (a+b)(c+d) which is 6164 Step 4 Subtract the result of step 1&2 from step 3 e = 6164 - 2652 - 672 = 2840 Step 5 Add up steps 1&2&4, after adding four trailing zeroes to step 1 and two trailing zeroes in step 4. You add the trailing zeros like in normal multiplication.

\[10^4ac + 10^2e + bd = 7006652\] becomes: \[10^4 * 672 + 10^2 * 2840 + 2652 = 7006652\]

A Recursive Algorithm

A recursive algorithm involves multiplication with fewer digits. In general, a number x with an even n of digits can be expressed in terms of two n/2-digit numbers, its first half and second half a and b. For example: 56*100=5600, 5600 + 78 = 5678. There are two zeros so we do 10^2

\[ xy\,=\,(10^{n/2}a\,+\,b)(10^{n/2}c\,+\,d)\] \[=\,10^nac\,+\,10^{n/2}(ad\,+\,bc)\,+\,bd\]

RecIntMult

Input: two n-digit positive integers x and y Output: the product x * y Assumption: n is s a power of 2

if n == 1 then
    Compute x*y in one step and return the result
else
    a, b := first and second halves of x
    c, d := first and second halves of y
    recursively compute ac := a*c, ad := a*d,
        bc := b*c, bd := b*d
    compute 10^n * ac + 10^n/2(ad+bc) + bd

Karatsuba

Karatsuba is an optimised version of the above RecIntMult. In the above algorithm, four recursive calls are needed, ac, ad, bc, bd. Can we do better?

Step 1: ac Step 2: bd Step 3: Instead of recursively computing ad or bc, we compute (a+b)(c+d)

Step 4: do step 3 minus step 1 and 2: \[(a+b)(c+d)-ac-bd\] \[ac +ad+bc+bd -ac-bd\] \[= ad + bc\] Step 5: Add up step 1, step 4 and step 2

if n == 1 then
    Compute x*y in one step and return the result
else
    a, b := first and second halves of x
    c, d := first and second halves of y
    p := a + b, q:= c + d
    recursively compute ac := a*c, d := b*d
                        pq := pq
    adbc := pq - ac - bd
    compute 10^n * ac + 10^n/2adbc + bd

Merge sort

Merge sort is split into two main functions: mergesort and merge. Merge does the merging process of the arrays together when they are split apart by mergesort.

pseudocode

So if the array A contains only 0 or 1 elements, MergeSort returns it as its already sorted! otherwise it keeps dividing the array more and more.

input: sorted arrays c and d (length of n/2 each) output: sorted array B of c and d combined together (length n) assumption: n is even

i := 1
j := 1
for k := 1 to n do
    if C[i] < D[j] then
        B[k] := C[i] // populate output array
        i := i + 1 // increment to next index
    else // D[j] < C[i]
        B[k] := D[j]
        j := j + 1

B is the output array, it iterates through C and D which are both n/2 long, but when it gets through both arrays it reaches n. If the current element in C is less than D, then add that element to B and increment the index for C. But if the smaller element is in D instead, then add that element to B and increment the index for D. This doesn’t take into account for an index out of bounds when one array has reached its end. Which means an additional loop is needed at the end to fill in the rest from the other array.

Running times

On Logarithms

Merge

The running time is a counting problem that tells us the number of lines that are executed in the algorithm in total. In the above code, there are at most 4l + 2 operations. But to make things easier we can say there are at most: \[4l + 2 <= 4l + 2l = 6l\] operations, where 6l is an upper bound. The exact number doesn’t matter when it comes to constant factors because different code implementations may have different constant factors.

Mergesort

Every recursive call is passed an input smaller than the one we started with. There’s a weird conflict when calculating the running times of these recursive calls because: There is an explosion of further recursive calls being produced, but also the input is shrinking on each recursive call (the array getting smaller). Like two opposing forces working against eachother.

Theorem 1.2 For every input array of length n >= 1. Mergesort performs \[6n\log_{2}n +6n\]

Work done at level j (work per level)

We start thinking about this by drawing a recursion tree with level 0 as the initial call, level 1 as the next set of recursive calls and so on, until the base case. We have to think about three main things here:

So, for j = 0, 1, 2,… there are \(2^j\) recursive calls, \(2^0 = 1\), \(2^1 = 2\), \(2^2 = 4\) each operating on a subarray of length \(\frac{n}{2^j}\) -> n, n/2, n/4 etc…

We can calculate the total work done at level-j recursive calls as: number of recursive calls at level-j * work per level-j recursive call \[ = 2^j * 6l\] \[ = 2^j * \frac{6n}{2^j}\] \[ = 6n\] For each subproblem(each circle) \(2^j\), there is a merge subroutine running on the children nodes of that circle. So there is a merge 6l for every circle, so we multiply them.

This means that the work done(number of operations done) at each level is somehow always the same even when the number of subproblems keeps on growing. The reason is because of the competing forces mentioned above where the amount of work halves at each level.

Work done at ALL levels

The recursion tree above has a total of \(\log_n + 1\) levels (+1 including level 0). This is because on each level of the tree, the array is split into two halves, this is repeatedly done until it reaches the base case where the size of the array is one singular element. This is the exact definition of \(\log_n\) where we keep dividing until it is less than base 2.

We can calculate total amount of work done at all levels by doing: Total number of levels * #Work done at level j (work per level) \[(\log_n + 1)6n\] \[<= 6n\log_n + 6n\] Algorithms like this that are either linear or ‘near linear’ time complexity are called for-free primitives because they don’t take much effort after reading in the input, so you can ‘freely’ use them.

Additional problems

work done at level-j

j=1 merging is O(n+n) -> O(2n) -> O(n) j=2 merging is O(2n + n) -> O(3n) -> O(n) j=3 merging is O(3n + n) -> O(4n) -> O(n) j=k-1 merging is O((k-1)n + n) -> O(nk - n + n) -> O(nk)

it is k-1 because merging the first two arrays count for one, so it adds up to k-1 and not k.

work done at all levels

To get the total we sum them all up O(n) + O(2n) + O(3n) + … + O(k-1)n (factor out n) n(1 + 2 + 3 + … + k-1) (use 1 + … + n -> n(n+1)/ 2) \[n(\frac{(k-1)k}{2})\] \[n(\frac{k^2-k}{2})\] \[\frac{nk^2-nk}{2}\] \[\mathcal{0}(nk^2)\]

Additional problem 2 (reverse merge sort)

work done at level-j

you have k arrays in total, and we are performing a merge k/2 times since a merge accounts for two circles, not one. You are always doing k/2 merges on every level. The merge function is O(n+n) -> O(n) as defined above. So the total work done at level-j is k/2 * n = O(nk/2) = O(nk)

Total work done

the number of arrays keeps on halving on each level until there is one singular array at the end, this is also just \(\log_n\). So the total work done is \[ O(nk\log_k)\] It is log of k because k(the number of arrays) is what is being halved until it reaches 1.

Conclusion

As you can see, in order to solve these problems, you just have to split it into two main steps: 1. work done at level-j 1. whether you’re going down a recursion tree 2. or going through an array etc. 2. how many levels are there? 3. work done at level-j * number of levels

Big-O Notation

if and only if T(n) is eventually bounded above by a constant multiple of f(n)

What this means is that T(n) is only equal to the big O of f(n) when: \[T(n) \leq c \cdot f(n)\]

Defnition 1

\(T(n) = O(f(n))\) if and only if there exists positive constants c and \(n_0\) such that: \[T(n) \leq c \cdot f(n)\] for all \(n \geq n_0\). Where c and \(n_0\) do not depend on n.

Examples

Degree-k polynomials are \(O(n^k)\)

In short this is just saying \(n^2\) is \(O(n^2)\). Now onto the mathematical version:

\[T(n) = a_{k}n^k +\:...\:+ a_{1}n^1 + a_{0}n^0\;then\;T(n) = O(f(n)),\;where\:f(n) = n^k\] > We have to actually prove the above statement using #Defnition 1 Basically, using algebra, we want to convert the RHS of T(n) into something that looks like \(n^k\).

We start by finding a c and \(n_0\) where c is a constant multiple of f(n) or \(n^k\) in this case. For now lets say \(n_0\) = 1 and c is equal to the sum of the absolute value of all the coefficients in T(n).

Lets start by making the differents powers of n into a common power of n. \[T(n) = a_kn^k+\:\dots\:+a_1n^k + a_0n^k\] Now that we have the \(n^k\) part that we are trying to prove above, we can see that. \[n^k + n^k + \dots + n^k > n^1+n^2 + \dots + n^k\] So this is the first step of having our formula bounded above T(n). Now that we a constant multiple, we can factor this out.

\[T(n) \leq (\lvert a_k \rvert + \dots + \lvert a_1 \rvert + \lvert a_0 \rvert)n^k\] \[T(n) \leq c \cdot n^k\] Which is what we were trying to prove at the start.

Degree-k polynomials are not \(O(n^{k-1})\)

\[T(n) = n^k\;then\;T(n) \neq O(n^k-1)\] To prove that one function is not big-O of another is best done using a proof by contradiction. So, assume that \(n^k = O(n^{k-1})\). This would imply that \(n^k\) is eventually bounded by a constant multiple of \(n^{k-1}\). So: \[n^k \leq c \cdot n^{k-1}\] One thing we can do here is divide \(n^{k-1}\) from both sides \[n \leq c\] This tells us that our constant c is always greater than every positive integer n which is clearly false. Therefore \(n^k \neq O(n^{k-1})\)

Other notations

Big-O refers to \(\leq\). “it wont take more than 2 hours” Big-omega refers to \(\geq\). “it will take at least 30 minutes” Big-theta refers to \(=\). “it usualy takes about 1 hour”

In essence, Big-O is an upper bound on the worse case running time. Big-Omega is a lower bound on the best-case running time. And Big-Theta is a tight-bound on the the average-case running time or the typical behaviour of the algorithm. For example, iterating through an array of size n is usually said to be \(O(n)\) when it’s more appropriate to say that it is \(\Theta (n)\).

Big-Omega notation

Similarly to big-O: T(n) is big-omega of f(n) if and only if T(n) is eventually bounded below by a constant multiple of f(n). Notice how it is the other way around now and that f(n) now goes below T(n) after \(n_0\).

Definition 2

\(T(n) = \Omega (f(n))\) if and only if there exist positive constants c and \(n_0\) such that \[T(n) \geq c \cdot f(n)\] for all \(n \geq n_0\)

Notice how we have our constant has to be a fraction so that f(n) can bound T(n) from below.

Big-Theta notation

This says that \(T(n) = \Theta (f(n))\) which means that both \[T(n) = \Omega(f(n)) = \Theta (f(n))\] are true at the same time. In the official definition, T(n) is eventually sandwiched between two different constant multiple of f(n)

Definition 3

\(T(n) = \Theta (f(n))\) if and only if there exist positive constants \(c1, c2\) and \(n_0\) such that: \[c1 \cdot f(n) \leq T(n) \leq c2 \cdot f(n)\] for all \(n \geq n_0\).

Notice how the RHS of the inequality represents big-O notation and the LHS represents big-omega notation. And so both have to be true at the same time in order for big-theta to be true. Additonally, \(n_0\) of big-theta would simply be the maximum \(n_0\) of big-omega or big-O.

Little-O notation

In order words, when we say \(T(n) = o(f(n))\), it means that T(n) strictly grows more slowly than f(n) and never equally slowly.

Definition 4

\(T(n) = o(f(n))\) if and only if for every positive constant c > 0, there exists a choice of \(n_0\) such that \[T(n) \leq c \cdot f(n)\] for all \(n \geq n_0\).

To prove that one function is little-o of another, we have to prove something stronger than big-O. That every constant c, \(n_0\) no matter how small, T(n) is eventually bounded by the constant multiple \(c \cdot f(n)\). Importantly, in the definition of little-o, the constant \(n_0\) can depend on c (but not on n), with smaller constants c generally requiring bigger constants \(n_0\) For example: \[n^{k-1} = o(n^k)\] #### Proof c > 0 and \(n_0\) = \(\frac{1}{c}\) \[n_0 \cdot n^{k-1} \leq n^K\] \[n^{k-1} \leq \frac{n^k}{n_0} \leq c \cdot n^k\] as required.

More examples

maximum vs sum

If \(T(n) = max\{f(n), g(n)\}\) then: \[ T(n) = \Theta (f(n) + g(n))\] \[max\{f(n), g(n)\} \leq f(n) + g(n)\] When you’re stuck here, remember we need to have \(T(n)\) wedged in between two inequalities, so we need to flip the inequality. We know that if we take the maximum between f(n) and g(n) and multiply it by 2, it will be greater than or equal to the right side.

\[2 \cdot max\{f(n), g(n)\} \geq f(n) + g(n)\] This is self explanatory because doubling the maximum value will always be greater than or equal than the bigger number + smaller number. We can now divide by 2 from both sides and switch them around \[\frac{1}{2}(f(n) + g(n)) \leq max\{f(n),g(n)\} \leq f(n) + g(n)\]

Remember that big-Theta is a tight-bound meaning that \(T(n)\) needs to be wedged in between two inequalities. If we look above we can see that \(T(n)\) is in the middle of two different multiple of \(f(n) + g(n)\). Therefore we can conclude that \(c_1 = \frac{1}{2}\) and \(c_2 = 1\) and \(n_0 = 1\).

In essence, we had to an operation to both sides, then do the opposite to both sides again so that we get the original T(n) in the middle.

Big-O and Big-Omega

if f(n), g(n) denote functions from the positive integers to the positive real numbers. Prove that \(f(n) = O(g(n))\) if and only if g(n) = \(\Omega(f(n))\)

Divide & Conquer

Inversions in an array

Think about the problem of of how many inversions are in an array, which means how many elements are in the wrong order. So for example in: [5, 1, 2, 3, 4] , there are exactly 4 inversions since 5 is out of place 4 times when comparing it to the other elements. A brute force implementation of this is:

numInv := 0
for i := 1 to n - 1 do
    # second loop is always ahead of first
    for j := i + 1 to n do
        if A[i] > A[j] then
            numInv += 1
return numInv

This is obviously \(O(n^2)\) in time complexity but can we do better? ### Divide and conquer approach The worst-case scenario is that every element on the left side is greaster than every element on the right side(imagine a sorted array, but the left and right side is swapped around). Here you have exactly \[\left(\frac{n}{2}\right)^2\] \[\left(\frac{n^2}{4}\right)\] split inversions. How do we solve this in nlogn time?

We can solve this in \(O(n \log_2n)\) by using what learned from merge sort using the following steps: 1. left inversion: inversions at index i, j which are both in the left half of the array (\(i, j \leq \frac{n}{2}\) ) 2. right inversion: inversions at index i, j which are both in the right half of the array (\(i, j > \frac{n}{2}\)) 3. split inversion: inversions where i is in the left and j is in the right half of the array (\(i \leq \frac{n}{2} < j\))

The reason why you need a split inversions is because if you have the array [1, 3, 5, 2, 4, 6]. If you split this array in half, the left and right sides are already sorted which means that there are no left inversions and right inversions. However there are split inversions which is comparing both the left and right which is similar to the merge step in merge sort. The updated algorithm looks like this:

if n == 0 or n == 1 then
    return (A, 0)
(C, leftInv) := countInv(left half of A)
(D, rightInv) := countInv(second half of A)
(B, splitInv) := Merge(C, D)
return (B, leftInv + rightInv + splitInv)

The worst-case scenario is that every element on the left side is greaster than every element on the right side(imagine a sorted array, but the left and right side is swapped around). Here you have exactly \[\left(\frac{n}{2}\right)^2\] \[\left(\frac{n^2}{4}\right)\] split inversions.

Merge

There is a split inversion whenever an element from the right subarray is greater than an element from the left subarray. To understand this, both arrays are in sorted order (x is in left and y is in right). So if x > y then there is a split inversion because y is copied over to the output array before x.

But the split inversion doesn’t just increment every time that the above happens. The number of split inversions is equal to the number of elements left in the left subarray, and this is because of the product rule!

Strassen’s Matrix Multiplication

Input: n ⇥ n integer matrices X and Y.
Output: Z = X · Y.
for i := 1 to n do
    for j := 1 to n do
        Z[i][j] := 0
        for k := 1 to n do
            Z[i][j] := Z[i][j] + X[i][k] · Y[k][j]
return Z

divide and conquer approach

Split the matrices into n/2 groups, then n/4 groups and so on until you reach a base case. The code is pretty self explanatory if you understand what is happening in Karatsuba

// base case
if n == 1 then
    return 1x1 matrix with X[1][1] * Y[1][1]

// recursive case
A, B, C, D := submatrices of X
E, F, G, H := submatrices of Y

// eight recursive calls
// recursively compute the eight matrix products
AE, BG, AF, BH, CE, DG, CF, DH

// constant number of matrix additions
AE + BG, AF + BH
CE + DG, CF + DH

Closest pair of points

X = median: middle line in the cartesian plane d = min(d1, d2): minimum of left and right regions

Step 3 is complicated because if such a pair exists, it has to be inside a strip. A strip is a region in the cartesian plane that goes from X - d to X + d. And this point has to cross over from the left region to the right region

The reason why a smaller pair(than d), can’t have a point outside of this strip is because: If a point is on the left region and it is \(d+1\) away from X, then the second point has to be on the right side which is also d away. This is already greater than d so this pair will never be smaller than d.

This does \(7 \cdot{} O(n)\) computation which is still \(O(n)\) which is valid. But why 7?

The same way that we have a strip that accounts for horizontal distance. A smaller pair than d would also have to be less than d vertically too. So a smaller pair would have to exist in a 2d by d rectangle. Where 2d is the size of the horizontal strip and d is the maximum vertical distance in the strip. And inside of this rectangle you have 7 boxes where the other point may exist.

These boxes are n/2 by n/2 which make up four boxes on the left region and four boxes on the right region. If one point is in one of these 8 boxes, then the second point is in one of the other 7 boxes.

The reason why two points can’t exist inside one n/2 by n/2 box is because the maximum distance(diagonal) inside that box is \[ \sqrt{\left(\frac{d}{2}\right)^2+\left(\frac{d}{2}\right)^2} = \frac{d}{\sqrt{2}}\] and \[\frac{d}{\sqrt{ 2 }} \lt d\] So if a point existed that was less than d, we would have found it already. The fact that d is our minimum so far means than there are no two points that exist within the small boxes in either the left or right region.

Master Method

Integer Multiplication revisited

If you remember, the RecIntMult algorithm creates smaller subproblems by breaking the n-digit numbers into two halves x and y. where \[x = 10^{n/2}\cdot a+b,\:y = 10^{n/2}\cdot c+d\]

and \[x\cdot y = 10^n\cdot(ac) + 10^{n/2}\cdot (ad + bc) + bd\] So far we have built our understanding of this algorithm intuitively in 1 Karatsuba but we intentionally left out the definition of a recurrence for simplicity. T(n) tells us the maximum number of operations used by a recursive algorithm. So the recurrence for the RecIntMult algorithm is \[T(n) \leq 4\cdot T\left(\frac{n}{2} \right)+O(n)\] The reasoning is because we split the number into two n/2 halves and calculate our partial products with those two splits. - M(n/2) for ac - M(n/2) for ad - M(n/2) for bc - M(n/2) for bd
This is where the \(4T(\frac{n}{2})\) part of the recurrence comes from. ## Karatsuba The only change from RecIntMult is that the number of recursive calls has dropped by one. So the recurrence is: \[T(n) \leq 3\cdot T\left(\frac{n}{2} \right)+O(n)\]

Formal Statement

Standard recurrences

here we will go over a version of the master method that handles “standard recurrences” which goes like this:

Base case: T(n) is at most a constant for all sufficiently small n. General case: for larger values of n, \[T(n) \leq a \cdot T\left(\frac{n}{b}\right) + O(n^d)\] So in words this can be though of > “the number of recursive calls multiplied by the time it takes for each subproblem added to the time of adding those subproblems back together again.”

Parameters: - a = number of recursive calls made at a time(for merge sort: 2) - b = factor by which the input size shrinks - \(\frac{n}{b}\) = the size of each subproblem in the recursion (original problem divided by b) - d = exponent of the running time of the combine step - \(n^d\) = the cost of dividing the problem and combining the results(work done outside the recursive calls), where d is some constant.

The base case simply meanswhen the input size is so small that no recursive calls are needed, the problem can be solved in O(1). This essentially just means: 1. You divide this task into ‘a’ smaller tasks. 2. Each smaller task is 1/b the size of the original task. 3. You spend some additional time (n^d) doing work outside of these smaller tasks.

Theorem

Now what is T(n) actually doing? If T(n) is defined by a standard recurrence as above where \(a \geq 1\), \(b > 1\), \(d \geq 0\).

T(n) = 1. \(O(n^d \log{n})\) if \(a = b^d\) (Case 1) 2. \(O(n^d)\) if \(a < b^d\) (case 2)(more work outside the recursive calls) 3. If \(O(n^{\log_{b}{a}})\) if \(a > b^d\) (case 3)

Note: \(b^d\) is the rate at which the work-per-subproblem is shrinking ### Intuition To intuitively understand the above statement, start with this video. To start off with, what is the height of a recursion tree? Well for merge sort we know it is \[\log_2{n}\]but more generally, it is \[\log_b{n}\]where b is what I am dividing the input size with. Now what is the width of a recursion tree or the number of leafs of the tree? Think about how many new nodes are added to the tree at each level. In merge sort there is 1->2->4->8 etc. It keeps multiplying by 2 exactly \(log_2{n}\)(the height) times, therefore the width or the number of leaf nodes at the end is \[2^{\log_2{n}}\] or more generally \[a^{\log_b{n}}\] where a is the number of recursive calls. This is also logarithmically equivalent to \[n^{\log_{b}{a}}\] Which would come out as \(n^1\) when you do \(log_2{2}\) for example. So you have n leaf nodes.

Examples

Merge sort Using the definition above, in the merge sort algorithm, there are two recursive calls, so a = 2 for the time complexity of both halves. And the size of the array is divided by two, so b = 2. And O(n) work is done outside the recursive calls so d=1 (if d=0 then O(1) work is done outside recursive calls). So the time complexity is case 1.

Binary search In binary search, we recurse on either the left half of the input array or the right half, but never both(like in merge sort), so there is only one recursive call so a = 1. This input size shrinks by 2 so b = 2. Only a single comparison is done outside the recursive call which means O(1) work is done, so d = 0. This fits under case 1: \[O(n^{0}\log{n}) = O(\log{n})\]

Recursive Integer multiplication If we look at the formula again in #Integer Multiplication revisited, we can see a = 4 because there are 4 recursive calls, and b = 2, because the length of a number is being divided by 2, and d = 1 because we have O(n). This gives us 4 > \(2^1\) which is case 3 \[O(n^{\log_2{4}}) \]

Karatsuba Multiplicaiton In 1 Karatsuba we see that this algorithm has 3 recursive calls instead of 4, so a = 3, but b is still 2 and d is still 1 because nothing there is changing. In this case, we are also in case 3 because 3 > \(2^1\) \[O(n^{\log_2{3}}) = O(n^{1.59})\]

Matrix multiplication In this algorithm found in 4 divide & conquer, this algorithm breaks the input matrices into four \(\frac{n}{2} by \frac{n}{2}\) matrices for each quadrant and performs 7 recursive calls on those smaller matrices so a = 7. Each of these calls is on a pair of these matrices where the size is divided by 2, so b = 2. The wrk done outside the calls involves \(n^2\) additions. So 7 >\(2^2\) which puts us in case 3. \[O(n^{\log_{2}{7}}) = O(n^{81})\]

Proof of the master method

Base case: \(T(1) \leq c\), here \(n_0\) = 1 General case: \[T(n) \leq a \cdot T\left(\frac{n}{b}\right) + cn^d\] ### Recursion tree reminder From 2 mergesort you should remember that there are \(a^j\) subproblems at level jbecause the number of recursive calls keeps multiplying by itself at each level, and at each level the subproblems are operating on a subarray of length \(\frac{n}{b^j}\) because we are dividing the array by the shrinkage factor b at the level j we also multiplies itself at the same rate as a.

The recurrence in the general case above(on the far right) says that the amount of work done in each level-j subproblem is at most a constant multiplied by the input size raised to the power d or \(c(n/b^j)^d\).

This tells us how much work is done in one subproblem, to work out how much work is done in every subproblem in level-j we multiply that by the total number of subproblems(\(a^j\)).

\[Work\:at\:level\:j \leq a^j \cdot c\left[\frac{n}{b^j}\right]^d\] number of subproblems multiplied by work per subproblem where the bracket is the input size. We can simplify this by grouping the parts that depend on j together(basic algebra swapping out of a fraction) \[Work\:at\:level\:j \leq cn^d \cdot \left[\frac{a}{b^d}\right]^j\] The RHS bracket \(a/b^d\) introduces us to the critical ratio. This ratio is important to the relevant case of the master method.

Intuition

If we sum up all the levels from level j = 0,1,2…, \(\log_b{n}\) we obtain the upper bound. \[total\:work \leq cn^d \cdot \sum_{ j=0 } ^ {\log_{b}{n}} \left[\frac{a}{b^d}\right]^j\]

So why is the critical ratio in the above formula so important? An intuitive way to understand it is a ‘tug of war between good and evil’. Evil is represented by a, which is more formally known as rate of subproblem proliferation (RSP). Which means with every level of recursion, the number of subproblems explodes by a factor of a(which is a bad thing hence ‘evil’). And good is represnted by b. Which is formally known as the rate of work shrinkage (RWS). This means that with every level, the amount of work done per subproblem decreases by a factor of b(specifically \(b^d\)).

The key question is which side wins? There are exactly three possible cases: - RSP = RWS (a draw) (amount of work performed is always the same) - RSP < RWS (good wins) (amount of work performed is decreasing) - RSP > RWS (evil wins) (amount of worked performed is increasing)

How do we know which case it falls under? - if \(a = b^d\) there is a balance (draw) - if \(a < b^d\) RWS is faster than RSP (good wins) - if \(a > b^d\) RSP is faster than RWS (evil wins)

Now we know why the critical ratio \(\frac{a}{b^d}\) is important because it tells us which of the three cases above we are in. We also now know where those three formulas come from in #Theorem:

Case 1: RSP = RWS The amount of work done in level 0 is the same as the amount of work done in every other level, so at level j we are obviously doing \(O(n^d)\) work. and there are \(O(\log{n})\) levels, so we have \(O(n^d \log{n})\) running time which is the same as case 1 above.

Case 2: RSP < RWS The amount of work done is decreasing, so the most amount of work is performed at level 0 and then it gets lower and lower. And this is easy because big O notation takes the dominating running time. Therefore the running time in the best case is \(O(n^d)\).

Case 3: RSP > RWS Here the amount of work done is increasing, so level 0 has the lowest running time and the leaves of the tree has the highest running time. So in contrast to case 2, we know that the work done at the leaves dominates the running time instead. We already know that there are \(a^{\log_{b}{n}}\) leaves from before. And at the leaves specifically, which is the base case of the algorithm, the base case only takes \(O(1)\) time, therefore the running time is \(O(a^{\log_{b}{n}})\). The only caveat is that in case 3 in #Theorem, a and n are flipped. This is mathematically correct in logarithms because: \[a^{\log_{b}{n}} = n^{\log_{b}{a}}\]

The RHS also makes inuitive sense as well, think back to merge sort where a=2 and b=2. This gives us \(n^{\log_2{2}}\) which is just \(n^1\). And this makes sense because the number of recursive calls should keep exploding in merge sort until we have every item in the original array as its own leaf, which is equivalent to n leaves.

Proofs of the cases

Case 1 is handled by the fact that a constant amount of work is done at each level so a proof is not needed.

Primer on Geometric series

A geometric series is a sum of numbers where each term is found by multiplying the previous term by a fixed non-zero number called the common ratio (r):

When r = 2: 1 + 2 + 4 + 8 + … + \(2^k\) when r = \(\frac{1}{2}\): 1 + \(\frac{1}{2}\) + \(\frac{1}{4}\) + \(\frac{1}{8}\) … + \(\frac{1}{k}\) = \(\frac{1}{2^k}\)

For case 2 and case 3 where you have increasing and decreasing number per level, you need to use a Geometric series(1 + r + r^2 + … + r^k) for the proof. And the ratio r for this geometric series is the critical ratio mentioned earlier \(\frac{a}{b^d}\).

As a shortcut, when \(r \neq 1\), there is a useful closed-form formula for a geometric series: 1 + r + \(r^2\) + … + \(r^k\) = \[\frac{1 - r^{k+1}}{1 - r}\]

You can verify this formula by just multiplying both sides by (\(1 - r\)): \[(1-r)(1+r+r^2 + \dots + r^k) = 1 - r + r - r^2 + r^2 - r^3 + r^3 - \dots - r^{k+1}\] \[= 1 - r^{k+1}\]

Additionally, when r is positive and less than 1: \[1+r+r^2+\dots+r^k \leq \frac{1}{1-r} = a\;cons\tan t\]

This only makes sense if you understand how the infinite series works in Geometric series. The most important take away is that, every geometric series with \(r\in (0, 1)\) is dominated by its first term, which is 1, and the sum is only O(1). In the video for geometric series, the first square is always the largest square, so it always dominates in terms of running time. It doesn’t matter how many powers of 1/2 you add up, the resulting sum is never more than 2(Go back to the bouncing ball example).

Second when r > 1: \[1+r+r^2+\dots+r^k=\frac{r^{k+1}-1}{r-1} \leq \frac{r^{k+1}}{r-1} = r^k \cdot \frac{r}{r-1}\]

Alternatively, when r > 1, the series keeps getting larger and larger, so the series is dominated by its last term instead which is \(r^k\). To see the intuition and proof for why the above formula works, go back to Geometric series. This first and last term business is important because its directly associated to case 2 and 3 where either the first node or last nodes(leaf) dominates the running time.

Bringing it back to Cases 2 and 3

In case 2 where \(a < b^d\) (good wins). Set r = \(\frac{a}{b^d}\) the critical ratio. Note that because a is less, the critical ratio will be less than 1(decreases) and the series will be at most \(\frac{1}{1-r}\) as demonstrated above and the bound becomes \[cn^d \cdot \sum_{j=0}^{\log_{b}{n}}r^j = O(n^d)\] where r is the critical ratio. So since we calculate that the summation(geometric series) above is at most \(\frac{1}{1-r}\) we have \[cn^d \cdot \frac{1}{1-r}\] The constant factor \(\frac{c}{1-r}\) is just a fraction of n, therefore it is suppressed leaving us with only \(O(n^d)\).

For \(a > b^d\) (evil wins) the number of subproblems exceeds the work shrinkage so the last term dominates the running time. This means the critical ration is greater than 1 so the bound from above becomes \[cn^d \cdot \sum_{j=1}^{\log_{b}{n}}r^j = O(n^d \cdot r^{\log_{b}{n}}) = O\left(n^d \cdot \left(\frac{a}{b^d}\right)^{\log_{b}{n}}\right)\]

Focus just on the RHS fraction, to make sense of this, we need to use algebra: \[(b^{-d})^{\log_{b}{n}} = b^{-d\log_{b}{n}} = (b^{\log_{b}{{n}}})^{-d} = (n^{\log_{b}{{b}}})^{-d} = (n^1)^{-d} \] \[= n^{-d}\] So now we have \(n^d\) and we have \(n^{-d}\) so if you swap out a from the fraction with \(n^d\) they should cancel out and you are left with \[O(a^{\log_{b}{n}})\] \[=O(n^{\log_{b}{a}})\] QED.

A Problem that doesnt use the master method

The running time of an algorithm is bounded by the non-standard recurrence with \(T(1) = 1\) and \[T(n) <= T(\lfloor \sqrt{n} \rfloor) + 1\] for n > 1. What is the correct upper bound on the asymptotic running time of the algorithm?

We can’t use the master method here because it is non standard. But we can see that every recursive call reduces the problem down to the square root of n and we’re only doing +1 unit of work on each step. This process of taking the square root repeatedly until it reaches the base case 1 is similar to a logarithm(dividing n by b until it reaches 1).

Taking the square root is like taking the logarithm twice. It’s basically saying, we want to take the logarithm, but we want it to execute it \(log_n\) times.

Quicksort

Intro

The goal of quicksort is to recursively order an array by partitioning it into left and right sections. Unlike mergesort, which divides the array in half, quicksort selects a pivot element. It then rearranges the other elements so that all elements less than the pivot are placed on its left-hand side (LHS), and all elements greater than the pivot are on its right-hand side (RHS). This process is then recursively applied to these left and right sub-arrays, which will typically have different lengths. This partitioning phase is efficient because it operates in place, requiring O(1) memory, and is completed in a single linear scan.

Similar to other sorting algorithms such as bubble sort and insertion sort, quicksort aims to establish boundaries or partitions within the array. In this case, we conceptually have two main portions: the elements that have already been examined and those that have not. Furthermore, among the examined elements, we are concerned with those that are less than the pivot and those that are greater than the pivot. To manage these boundaries, two pointers, conventionally named i and j, are employed.

Within the main loop of the partitioning process, typically only the right pointer (j) increments with each iteration. The left pointer (i) serves to mark the next position where an element smaller than the pivot should be placed. If the element at the right pointer’s current position is larger than the pivot, no immediate action is taken; the element remains in its current location, as it belongs on the RHS. However, if an element encountered by the right pointer is less than the pivot, it is swapped with the element at the left pointer’s position, and then the left pointer is incremented to prepare for the next potential swap of a smaller element. Once the right pointer has traversed the entire array, the pivot element is finally swapped with the element at the left pointer’s position, effectively placing the pivot in its sorted position within the partition.

Code

function quickSort (int[] array , int left , int right) {
    if (left < right) {
        int pivot = array [right];
        int insertIndex = left;
        for (int l = left; l <= right; l++) {
            if ( array [l] <= pivot) {
                swap(l , insertIndex ++);
            }
        }
        quickSort(array, left, insertIndex - 2);
        quickSort(array, insertIndex, right);
    }
}

Choosing good pivots

Always choosing the first or last element as the pivot is bad when the array is already sorted. This is because on each iteration, you are doing the same amount of work again but on n-1 elements, then n-2 elements, so you have n + (n-1) + (n-2) + … + 1 which is \(O(n^2)\). Choosing a good pivot is about having roughly the same amount of elements on the left and right of the pivot so that it is balanced, this is the median element. So you can have a chosePibot function that takes an input array and returns the median element. And the median can be calculated in linear time(7 Linear-time selection). If we calculate this median, then quicksort runs in \(O(nlogn)\) time which should be obvious since a pivot at the medium means the array should be split in half similar to mergesort. The running time is exactly: \[T(n) \leq 2\cdot T\left(\frac{n}{2} \right)+O(n)\] Where it is \(\frac{n}{2}\) because the pivot is the median and the array should theoretically be split in half and we recurse onto those two halfs. It is still + \(O(n)\) in the end since the choosepivot function and partition function run in linear time. So a = b = 2 and d = 1 which gives \(O(nlogn)\).

Randomized Quicksort

Selecting the first or last element of an array takes O(1) times whereas calculated the median takes O(n) time. We can instead get the best of both worlds by using randomized quicksort. This just simply returns a random element in the array as the pivot, alternatively you can shuffle the array and then reuse the same algorithm as described above.

In this case its very unlikely that it will choose the smallest or largest element in the array in terms of probability. So the average case scenario will lead to an average running time of \(O(nlogn)\).

Geometric Series

Intuition

r < 1 (Convergent series)

r > 1 (Divergent series)

Proof for r > 1 formula

Karatsuba

Integer Multiplication

Number of operations

5678 * 4

the rest

Karatsuba Multiplication

A Recursive Algorithm

RecIntMult

Karatsuba

Merge sort

pseudocode

Running times

On Logarithms

Merge

Mergesort

Work done at level j (work per level)

Work done at ALL levels

Additional problems

work done at level-j

work done at all levels

Additional problem 2 (reverse merge sort)

work done at level-j

Total work done

Conclusion

Big-O Notation

Defnition 1

Examples

Degree-k polynomials are \(O(n^k)\)

Degree-k polynomials are not \(O(n^{k-1})\)

Other notations

Big-Omega notation

Definition 2

Big-Theta notation

Definition 3

Little-O notation

Definition 4

More examples

maximum vs sum

Big-O and Big-Omega

Divide & Conquer

Inversions in an array

Merge

Strassen’s Matrix Multiplication

divide and conquer approach

Closest pair of points

Master Method

Integer Multiplication revisited

Formal Statement

Standard recurrences

Theorem

Examples

Proof of the master method

Intuition

Proofs of the cases

Primer on Geometric series

Bringing it back to Cases 2 and 3

A Problem that doesnt use the master method

Quicksort

Intro

Code

Choosing good pivots

Randomized Quicksort