Refer back to this video
Imagine you have a ball that bounces. Each time it bounces, it reaches 1/2 the height of the previous bounce.
If you add up all these distances, you’d get:
1 + 1/2 + 1/4 + 1/8 + …
This is a geometric series with a = 1 and r = 1/2.
Using the formula: S∞ = 1 / (1 - 1/2) = 1 / (1/2) = 2
So the total distance the ball travels is 2 meters, even though it bounces infinitely! What this means is that 2 is the limit and that even though it bounces infinitely it will never go past 2.
If r were 1 or greater, the terms would not get smaller, and the series would not converge to a finite sum. The formula only works when the series eventually approaches a limit.
When r is greater than 1, the geometric series behaves very differently: 1. Each term gets larger than the previous one. 2. The series keeps growing without limit. 3. There is no finite sum for an infinite series in this case.
Another example is imagine a rumor spreading: - On day 1, 1 person knows the rumor. - On day 2, 3 people know (it tripled). - On day 3, 9 people know (tripled again). - On day 4, 27 people know (tripled again).
This forms a geometric series with a = 1 and r = 3: 1 + 3 + 9 + 27 + …
Key differences from r < 1 1. No Convergence: The series doesn’t approach a fixed value. 2. No Finite Sum: For an infinite series, there’s no formula to calculate a finite sum. 3. Rapid Growth: The terms get bigger very quickly.
While we can’t sum an infinite series when r > 1, we can sum a finite number of terms:
\[Sn = \frac{a(1 - r^n)}{(1 - r)}\]
Where:
When r > 1, the series grows without bound. It’s useful for modeling exponential growth scenarios, but you can’t calculate a finite sum for an infinite number of terms.
Let’s break down why the middle part formula \((r^{k+1} - 1) / (r - 1)\) works for the sum of a geometric series:
And that’s the formula. It works because of the clever way the terms cancel out when you subtract. This method is called the “subtract and shift” technique.
x = 5678, y = 1234
(so n = 4) partial product: 5678 * 4
= 22712 5678 * 3 = 17034 5678 * 2 etc…
We add a 0
at the end of each partial product then add
up all of them. This has n
columns and n
rows
in the calculation
In general, computing a singular partial product
involves n
multiplications (one per digit) and at most
n
additions (sum and carry), for a total of at most
2n
primitive operations.
Since there are n
partial products, computing all of
them requires at most n * 2n = 2n^2
operations.
\[total\,number\,of\,operations <= constant\,*\,n^2\]
x = 5678
y = 1234` (so n = 4) a = 56 & b = 78 c = 12
& d = 34
Step 1 Compute ac
which is 672
Step 2 Compute bd
which is 2652
Step 3 Compute (a+b)(c+d)
which is 6164
Step 4 Subtract the result of step 1&2 from step 3
e = 6164 - 2652 - 672 = 2840 Step 5 Add up steps
1&2&4, after adding four trailing zeroes to step 1 and two
trailing zeroes in step 4. You add the trailing zeros like in normal
multiplication.
\[10^4ac + 10^2e + bd = 7006652\] becomes: \[10^4 * 672 + 10^2 * 2840 + 2652 = 7006652\]
A recursive algorithm involves multiplication with fewer digits. In
general, a number x
with an even n
of digits
can be expressed in terms of two n/2-digit numbers
, its
first half and second half a
and b
. For
example: 56*100=5600, 5600 + 78 = 5678
. There are two zeros
so we do 10^2
\[x = 10^{n/2}a\,+\,b\] Similarly, we can write
\[y = 10^{n/2}c\,+\,d\]
If we multiply these two together:
\[ xy\,=\,(10^{n/2}a\,+\,b)(10^{n/2}c\,+\,d)\] \[=\,10^nac\,+\,10^{n/2}(ad\,+\,bc)\,+\,bd\]
Input: two n-digit positive integers x
and
y
Output: the product x * y
Assumption:
n
is s a power of 2
if n == 1 then
*y in one step and return the result
Compute xelse
, b := first and second halves of x
a, d := first and second halves of y
c:= a*c, ad := a*d,
recursively compute ac := b*c, bd := b*d
bc 10^n * ac + 10^n/2(ad+bc) + bd compute
Karatsuba is an optimised version of the above RecIntMult. In the above algorithm, four recursive calls are needed, ac, ad, bc, bd. Can we do better?
Step 1: ac Step 2: bd Step 3: Instead of recursively computing ad or bc, we compute (a+b)(c+d)
Step 4: do step 3 minus step 1 and 2: \[(a+b)(c+d)-ac-bd\] \[ac +ad+bc+bd -ac-bd\] \[= ad + bc\] Step 5: Add up step 1, step 4 and step 2
if n == 1 then
*y in one step and return the result
Compute xelse
, b := first and second halves of x
a, d := first and second halves of y
c:= a + b, q:= c + d
p := a*c, d := b*d
recursively compute ac := pq
pq := pq - ac - bd
adbc 10^n * ac + 10^n/2adbc + bd compute
Merge sort is split into two main functions: mergesort and merge. Merge does the merging process of the arrays together when they are split apart by mergesort.
Input: array A of n ints output: sorted array
// ignoring base cases
c := recursively sort first half of A
d := recursively sort second half of A
return merge(c, d)
So if the array A contains only 0 or 1 elements, MergeSort returns it as its already sorted! otherwise it keeps dividing the array more and more.
input: sorted arrays c and d (length of n/2 each) output: sorted array B of c and d combined together (length n) assumption: n is even
:= 1
i := 1
j for k := 1 to n do
if C[i] < D[j] then
[k] := C[i] // populate output array
B:= i + 1 // increment to next index
i else // D[j] < C[i]
[k] := D[j]
B:= j + 1 j
B is the output array, it iterates through C and D which are both n/2 long, but when it gets through both arrays it reaches n. If the current element in C is less than D, then add that element to B and increment the index for C. But if the smaller element is in D instead, then add that element to B and increment the index for D. This doesn’t take into account for an index out of bounds when one array has reached its end. Which means an additional loop is needed at the end to fill in the rest from the other array.
For a positive integer n, \(log_{2}n\) just means the following: type
n
into a calculator, and count the number of times you need to divide it by 2 until the result is less than 2(or whatever the base number is)For example, it takes 5 divisions by two to bring 32 down to 1. So \(log_{2}32\) = 5, and \(log_{2}1024 = 10\). Notice how the log of
n
is much less thann
itself, 10 is much smaller than 1024. This difference in size becomes even bigger asn
grows larger. Another way to think about it is “the larger thatn
is, the more efficient the algorithm is”
The running time is a counting problem that tells us the number of
lines that are executed in the algorithm in total. In the above code,
there are at most 4l + 2
operations. But to make things
easier we can say there are at most: \[4l + 2
<= 4l + 2l = 6l\] operations, where 6l is an upper
bound. The exact number doesn’t matter when it comes to constant
factors because different code implementations may have different
constant factors.
Every recursive call is passed an input smaller than the one we started with. There’s a weird conflict when calculating the running times of these recursive calls because: There is an explosion of further recursive calls being produced, but also the input is shrinking on each recursive call (the array getting smaller). Like two opposing forces working against eachother.
Theorem 1.2 For every input array of length
n >= 1
. Mergesort performs \[6n\log_{2}n +6n\]
We start thinking about this by drawing a recursion tree with level 0 as the initial call, level 1 as the next set of recursive calls and so on, until the base case. We have to think about three main things here:
So, for j = 0, 1, 2,… there are \(2^j\) recursive calls, \(2^0 = 1\), \(2^1 = 2\), \(2^2 = 4\) each operating on a subarray of length \(\frac{n}{2^j}\) -> n, n/2, n/4 etc…
We can calculate the total work
done at level-j recursive
calls as: number of recursive calls at level-j * work per level-j
recursive call \[ = 2^j * 6l\] \[ = 2^j * \frac{6n}{2^j}\] \[ = 6n\] For each subproblem(each circle)
\(2^j\), there is a merge subroutine
running on the children nodes of that circle. So there is a merge
6l
for every circle, so we multiply them.
This means that the work done(number of operations done) at each level is somehow always the same even when the number of subproblems keeps on growing. The reason is because of the competing forces mentioned above where the amount of work halves at each level.
The recursion tree above has a total of \(\log_n + 1\) levels (+1 including level 0). This is because on each level of the tree, the array is split into two halves, this is repeatedly done until it reaches the base case where the size of the array is one singular element. This is the exact definition of \(\log_n\) where we keep dividing until it is less than base 2.
We can calculate total amount of work done at all levels by doing: Total number of levels * #Work done at level j (work per level) \[(\log_n + 1)6n\] \[<= 6n\log_n + 6n\] Algorithms like this that are either linear or ‘near linear’ time complexity are called for-free primitives because they don’t take much effort after reading in the input, so you can ‘freely’ use them.
You are given k sorted arrays, each with n elements, and you want to combine them into a single array of kn elements. One approach is to merge the first two arrays, then merging the result with the third array, then with the fourth array, and so on until you merge in the k th and final input array. What is the running time taken by this successive merging algorithm?
j=1 merging is O(n+n) -> O(2n) -> O(n) j=2 merging is O(2n + n) -> O(3n) -> O(n) j=3 merging is O(3n + n) -> O(4n) -> O(n) j=k-1 merging is O((k-1)n + n) -> O(nk - n + n) -> O(nk)
it is k-1 because merging the first two arrays count for one, so it adds up to k-1 and not k.
To get the total we sum them all up O(n) + O(2n) + O(3n) + … + O(k-1)n (factor out n) n(1 + 2 + 3 + … + k-1) (use 1 + … + n -> n(n+1)/ 2) \[n(\frac{(k-1)k}{2})\] \[n(\frac{k^2-k}{2})\] \[\frac{nk^2-nk}{2}\] \[\mathcal{0}(nk^2)\]
Consider again the problem of merging k sorted length- n arrays into a single sorted length-kn array. Consider the algorithm that first divides the k arrays into k/2 pairs of arrays(number of arrays left gets smaller), and uses the Merge subroutine to combine each pair, resulting in k/ 2 sorted length-2n arrays(2n because the merged array is longer). The algorithm repeats this step until there is only one length-kn sorted array. What is the running time of this procedure (think of this as merge sort but going upwards rather than down.)
you have k arrays in total, and we are performing a merge k/2 times
since a merge accounts for two circles, not one. You are always doing
k/2 merges on every level. The merge function is O(n+n) -> O(n) as
defined above. So the total work done at level-j is k/2 * n
= O(nk/2) = O(nk)
the number of arrays keeps on halving on each level until there is one singular array at the end, this is also just \(\log_n\). So the total work done is \[ O(nk\log_k)\] It is log of k because k(the number of arrays) is what is being halved until it reaches 1.
As you can see, in order to solve these problems, you just have to split it into two main steps: 1. work done at level-j 1. whether you’re going down a recursion tree 2. or going through an array etc. 2. how many levels are there? 3. work done at level-j * number of levels
Big-O notation concerns the function T(n) where n=1,2,… T(n) is ‘T’ for the time complexity and almost always denotes an upper bound on the worse-case running time.
O(f(n)) is an upper bound on the growth rate. It tells you that T(n) grows no faster than f(n) when n is large. Therefore, if you multiply f(n) by a constant c, T(n) will never get higher than
c * f(n)
because the growth rates are the same or even higher
\[T(n) = O(f(n))\]
if and only if T(n) is eventually bounded above by a constant multiple of f(n)
What this means is that T(n) is only equal to the big O of f(n) when: \[T(n) \leq c \cdot f(n)\]
\(T(n) = O(f(n))\) if and only if there exists positive constants c and \(n_0\) such that: \[T(n) \leq c \cdot f(n)\] for all \(n \geq n_0\). Where c and \(n_0\) do not depend on n.
In short this is just saying \(n^2\) is \(O(n^2)\). Now onto the mathematical version:
\[T(n) = a_{k}n^k +\:...\:+ a_{1}n^1 + a_{0}n^0\;then\;T(n) = O(f(n)),\;where\:f(n) = n^k\] > We have to actually prove the above statement using #Defnition 1 Basically, using algebra, we want to convert the RHS of T(n) into something that looks like \(n^k\).
We start by finding a c and \(n_0\) where c is a constant multiple of f(n) or \(n^k\) in this case. For now lets say \(n_0\) = 1 and c is equal to the sum of the absolute value of all the coefficients in T(n).
Lets start by making the differents powers of n into a common power of n. \[T(n) = a_kn^k+\:\dots\:+a_1n^k + a_0n^k\] Now that we have the \(n^k\) part that we are trying to prove above, we can see that. \[n^k + n^k + \dots + n^k > n^1+n^2 + \dots + n^k\] So this is the first step of having our formula bounded above T(n). Now that we a constant multiple, we can factor this out.
\[(a_k + \dots +a_1+a_0)n^k\] Now the only problem is that the coefficients may have negative values, so its a good idea to take the absolute of those values. The absolute of the coefficients added up together will always we greater than or equal than the original sum. \[\underset{c}{\underline{(\text{$\lvert a_k \rvert + \dots + \lvert a_1 \rvert + \lvert a_0 \rvert)n^k$})}}n^k\] \[ = c \cdot n^k\] Now back to the RHS of T(n), we can now say with confidence: \[T(n) = a_kn^k+\dots+a_1n^k + a_0n^k \leq (\lvert a_k \rvert + \dots + \lvert a_1 \rvert + \lvert a_0 \rvert)n^k\]\[T(n) \leq (\lvert a_k \rvert + \dots + \lvert a_1 \rvert + \lvert a_0 \rvert)n^k\] \[T(n) \leq c \cdot n^k\] Which is what we were trying to prove at the start.
Here we are just saying that \(n^3\) would not be equal to \(O(n^2)\)
\[T(n) = n^k\;then\;T(n) \neq
O(n^k-1)\] To prove that one function is not big-O of another is
best done using a proof by contradiction. So, assume that \(n^k = O(n^{k-1})\). This would imply that
\(n^k\) is eventually bounded by a
constant multiple of \(n^{k-1}\). So:
\[n^k \leq c \cdot n^{k-1}\] One thing
we can do here is divide \(n^{k-1}\)
from both sides \[n \leq c\] This
tells us that our constant c is always greater than every positive
integer n
which is clearly false. Therefore \(n^k \neq O(n^{k-1})\)
Big-O refers to \(\leq\). “it wont take more than 2 hours” Big-omega refers to \(\geq\). “it will take at least 30 minutes” Big-theta refers to \(=\). “it usualy takes about 1 hour”
In essence, Big-O is an
upper bound on the worse case running time.
Big-Omega is a lower
bound on the best-case running time. And Big-Theta is a tight-bound on
the the average-case running time or the typical
behaviour of the algorithm. For example, iterating through an array of
size n
is usually said to be \(O(n)\) when it’s more appropriate to say
that it is \(\Theta (n)\).
Similarly to big-O: T(n) is big-omega of f(n) if and only if T(n) is eventually bounded below by a constant multiple of f(n). Notice how it is the other way around now and that f(n) now goes below T(n) after \(n_0\).
\(T(n) = \Omega (f(n))\) if and only if there exist positive constants c and \(n_0\) such that \[T(n) \geq c \cdot f(n)\] for all \(n \geq n_0\)
Notice how we have our constant has to be a fraction so that f(n) can bound T(n) from below.
This says that \(T(n) = \Theta (f(n))\) which means that both \[T(n) = \Omega(f(n)) = \Theta (f(n))\] are true at the same time. In the official definition, T(n) is eventually sandwiched between two different constant multiple of f(n)
\(T(n) = \Theta (f(n))\) if and only if there exist positive constants \(c1, c2\) and \(n_0\) such that: \[c1 \cdot f(n) \leq T(n) \leq c2 \cdot f(n)\] for all \(n \geq n_0\).
Notice how the RHS of the inequality represents big-O notation and the LHS represents big-omega notation. And so both have to be true at the same time in order for big-theta to be true. Additonally, \(n_0\) of big-theta would simply be the maximum \(n_0\) of big-omega or big-O.
big-O notation is \(\leq\) little-o notation is \(<\)
In order words, when we say \(T(n) = o(f(n))\), it means that T(n) strictly grows more slowly than f(n) and never equally slowly.
\(T(n) = o(f(n))\) if and only if for every positive constant c > 0, there exists a choice of \(n_0\) such that \[T(n) \leq c \cdot f(n)\] for all \(n \geq n_0\).
To prove that one function is little-o of another, we have to prove something stronger than big-O. That every constant c, \(n_0\) no matter how small, T(n) is eventually bounded by the constant multiple \(c \cdot f(n)\). Importantly, in the definition of little-o, the constant \(n_0\) can depend on c (but not on n), with smaller constants c generally requiring bigger constants \(n_0\) For example: \[n^{k-1} = o(n^k)\] #### Proof c > 0 and \(n_0\) = \(\frac{1}{c}\) \[n_0 \cdot n^{k-1} \leq n^K\] \[n^{k-1} \leq \frac{n^k}{n_0} \leq c \cdot n^k\] as required.
If \(T(n) = max\{f(n), g(n)\}\) then: \[ T(n) = \Theta (f(n) + g(n))\] \[max\{f(n), g(n)\} \leq f(n) + g(n)\] When you’re stuck here, remember we need to have \(T(n)\) wedged in between two inequalities, so we need to flip the inequality. We know that if we take the maximum between f(n) and g(n) and multiply it by 2, it will be greater than or equal to the right side.
\[2 \cdot max\{f(n), g(n)\} \geq f(n) + g(n)\] This is self explanatory because doubling the maximum value will always be greater than or equal than the bigger number + smaller number. We can now divide by 2 from both sides and switch them around \[\frac{1}{2}(f(n) + g(n)) \leq max\{f(n),g(n)\} \leq f(n) + g(n)\]
Remember that big-Theta is a tight-bound meaning that \(T(n)\) needs to be wedged in between two inequalities. If we look above we can see that \(T(n)\) is in the middle of two different multiple of \(f(n) + g(n)\). Therefore we can conclude that \(c_1 = \frac{1}{2}\) and \(c_2 = 1\) and \(n_0 = 1\).
In essence, we had to an operation to both sides, then do the opposite to both sides again so that we get the original T(n) in the middle.
if f(n), g(n) denote functions from the positive integers to the positive real numbers. Prove that \(f(n) = O(g(n))\) if and only if g(n) = \(\Omega(f(n))\)
Proof
We need to prove this in both directions:
(1) If f(n) = O(g(n)), then g(n) = Ω(f(n))
(2) If g(n) = Ω(f(n)), then f(n) = O(g(n))
[!NOTE] This problem is basically just 2 mergesort but with some extra counting steps, that’s it.
Think about the problem of of how many inversions are in an array,
which means how many elements are in the wrong order. So for example in:
[5, 1, 2, 3, 4]
, there are exactly 4 inversions since 5 is
out of place 4 times when comparing it to the other elements. A brute
force implementation of this is:
:= 0
numInv for i := 1 to n - 1 do
# second loop is always ahead of first
for j := i + 1 to n do
if A[i] > A[j] then
+= 1
numInv return numInv
This is obviously \(O(n^2)\) in time complexity but can we do better? ### Divide and conquer approach The worst-case scenario is that every element on the left side is greaster than every element on the right side(imagine a sorted array, but the left and right side is swapped around). Here you have exactly \[\left(\frac{n}{2}\right)^2\] \[\left(\frac{n^2}{4}\right)\] split inversions. How do we solve this in nlogn time?
We can solve this in \(O(n \log_2n)\) by using what learned from merge sort using the following steps: 1. left inversion: inversions at index i, j which are both in the left half of the array (\(i, j \leq \frac{n}{2}\) ) 2. right inversion: inversions at index i, j which are both in the right half of the array (\(i, j > \frac{n}{2}\)) 3. split inversion: inversions where i is in the left and j is in the right half of the array (\(i \leq \frac{n}{2} < j\))
The reason why you need a split inversions is because if you have the
array [1, 3, 5, 2, 4, 6]
. If you split this array in half,
the left and right sides are already sorted which means that there are
no left inversions and right inversions. However there are split
inversions which is comparing both the left and right which is similar
to the merge step in merge sort. The updated algorithm looks like
this:
if n == 0 or n == 1 then
return (A, 0)
(C, leftInv) := countInv(left half of A)
(D, rightInv) := countInv(second half of A)
(B, splitInv) := Merge(C, D)
return (B, leftInv + rightInv + splitInv)
The worst-case scenario is that every element on the left side is greaster than every element on the right side(imagine a sorted array, but the left and right side is swapped around). Here you have exactly \[\left(\frac{n}{2}\right)^2\] \[\left(\frac{n^2}{4}\right)\] split inversions.
There is a split inversion whenever an element from the right subarray is greater than an element from the left subarray. To understand this, both arrays are in sorted order (x is in left and y is in right). So if x > y then there is a split inversion because y is copied over to the output array before x.
But the split inversion doesn’t just increment every time that the above happens. The number of split inversions is equal to the number of elements left in the left subarray, and this is because of the product rule!
The naive implementation is \(O(n^3)\) to multiply matrices.
: n ⇥ n integer matrices X and Y.
Input: Z = X · Y.
Outputfor i := 1 to n do
for j := 1 to n do
[i][j] := 0
Zfor k := 1 to n do
[i][j] := Z[i][j] + X[i][k] · Y[k][j]
Zreturn Z
Split the matrices into n/2 groups, then n/4 groups and so on until you reach a base case. The code is pretty self explanatory if you understand what is happening in Karatsuba
// base case
if n == 1 then
return 1x1 matrix with X[1][1] * Y[1][1]
// recursive case
, B, C, D := submatrices of X
A, F, G, H := submatrices of Y
E
// eight recursive calls
// recursively compute the eight matrix products
, BG, AF, BH, CE, DG, CF, DH
AE
// constant number of matrix additions
+ BG, AF + BH
AE + DG, CF + DH CE
X = median
: middle line in the cartesian plane
d = min(d1, d2)
: minimum of left and right regions
d
Step 3 is complicated because if such a pair exists, it has to be
inside a strip
. A strip is a region in the cartesian plane
that goes from X - d
to X + d
. And this point
has to cross over from
the left region to the right region
The reason why a smaller pair(than d
), can’t have a
point outside of this strip is because: If a point is on the left region
and it is \(d+1\) away from
X
, then the second point has to be on the
right side which is also d
away. This is already greater
than d
so this pair will never be smaller than
d
.
d
and return the
minimum.This does \(7 \cdot{} O(n)\) computation which is still \(O(n)\) which is valid. But why 7?
The same way that we have a strip that accounts for horizontal
distance. A smaller pair than d
would also have to be less
than d
vertically too. So a smaller pair would have to
exist in a 2d by d rectangle. Where 2d is the size of the horizontal
strip and d is the maximum vertical distance in the strip. And inside of
this rectangle you have 7 boxes where the other point
may exist.
These boxes are n/2 by n/2 which make up four boxes on the left region and four boxes on the right region. If one point is in one of these 8 boxes, then the second point is in one of the other 7 boxes.
The reason why two points can’t exist inside one n/2 by n/2 box is because the maximum distance(diagonal) inside that box is \[ \sqrt{\left(\frac{d}{2}\right)^2+\left(\frac{d}{2}\right)^2} = \frac{d}{\sqrt{2}}\] and \[\frac{d}{\sqrt{ 2 }} \lt d\] So if a point existed that was less than d, we would have found it already. The fact that d is our minimum so far means than there are no two points that exist within the small boxes in either the left or right region.
If you remember, the RecIntMult algorithm creates smaller subproblems by breaking the n-digit numbers into two halves x and y. where \[x = 10^{n/2}\cdot a+b,\:y = 10^{n/2}\cdot c+d\]
and \[x\cdot y = 10^n\cdot(ac) +
10^{n/2}\cdot (ad + bc) + bd\] So far we have built our
understanding of this algorithm intuitively in 1 Karatsuba but we intentionally left out the
definition of a recurrence for simplicity. T(n) tells us the
maximum number of operations used by a recursive algorithm. So the
recurrence for the RecIntMult algorithm is \[T(n) \leq 4\cdot T\left(\frac{n}{2}
\right)+O(n)\] The reasoning is because we split the number into
two n/2 halves and calculate our partial products with those two splits.
- M(n/2) for ac - M(n/2) for ad - M(n/2) for bc - M(n/2) for bd
This is where the \(4T(\frac{n}{2})\)
part of the recurrence comes from. ## Karatsuba The only change from
RecIntMult is that the number of recursive calls has dropped by one. So
the recurrence is: \[T(n) \leq 3\cdot
T\left(\frac{n}{2} \right)+O(n)\]
here we will go over a version of the master method that handles “standard recurrences” which goes like this:
Base case: T(n) is at most a constant for all sufficiently small n. General case: for larger values of n, \[T(n) \leq a \cdot T\left(\frac{n}{b}\right) + O(n^d)\] So in words this can be though of > “the number of recursive calls multiplied by the time it takes for each subproblem added to the time of adding those subproblems back together again.”
Parameters: - a = number of recursive calls made at a time(for merge sort: 2) - b = factor by which the input size shrinks - \(\frac{n}{b}\) = the size of each subproblem in the recursion (original problem divided by b) - d = exponent of the running time of the combine step - \(n^d\) = the cost of dividing the problem and combining the results(work done outside the recursive calls), where d is some constant.
The base case simply meanswhen the input size is so small that no recursive calls are needed, the problem can be solved in O(1). This essentially just means: 1. You divide this task into ‘a’ smaller tasks. 2. Each smaller task is 1/b the size of the original task. 3. You spend some additional time (n^d) doing work outside of these smaller tasks.
Now what is T(n) actually doing? If T(n) is defined by a standard recurrence as above where \(a \geq 1\), \(b > 1\), \(d \geq 0\).
T(n) = 1. \(O(n^d \log{n})\) if \(a = b^d\) (Case 1) 2. \(O(n^d)\) if \(a < b^d\) (case 2)(more work outside the recursive calls) 3. If \(O(n^{\log_{b}{a}})\) if \(a > b^d\) (case 3)
Note: \(b^d\) is
the rate at which the work-per-subproblem is shrinking ### Intuition To
intuitively understand the above statement, start with this video. To
start off with, what is the height of a recursion tree? Well for merge
sort we know it is \[\log_2{n}\]but
more generally, it is \[\log_b{n}\]where b
is what I
am dividing the input size with. Now what is the width of a recursion
tree or the number of leafs of the tree? Think about how many
new nodes are added to the tree at each level. In merge sort there is
1->2->4->8 etc. It keeps multiplying by 2 exactly \(log_2{n}\)(the height) times, therefore the
width or the number of leaf nodes at the end is \[2^{\log_2{n}}\] or more generally \[a^{\log_b{n}}\] where a
is
the number of recursive calls. This is also logarithmically equivalent
to \[n^{\log_{b}{a}}\] Which would
come out as \(n^1\) when you do \(log_2{2}\) for example. So you have
n
leaf nodes.
Merge sort Using the definition above, in the merge sort algorithm, there are two recursive calls, so a = 2 for the time complexity of both halves. And the size of the array is divided by two, so b = 2. And O(n) work is done outside the recursive calls so d=1 (if d=0 then O(1) work is done outside recursive calls). So the time complexity is case 1.
Binary search In binary search, we recurse on either the left half of the input array or the right half, but never both(like in merge sort), so there is only one recursive call so a = 1. This input size shrinks by 2 so b = 2. Only a single comparison is done outside the recursive call which means O(1) work is done, so d = 0. This fits under case 1: \[O(n^{0}\log{n}) = O(\log{n})\]
Recursive Integer multiplication If we look at the formula again in #Integer Multiplication revisited, we can see a = 4 because there are 4 recursive calls, and b = 2, because the length of a number is being divided by 2, and d = 1 because we have O(n). This gives us 4 > \(2^1\) which is case 3 \[O(n^{\log_2{4}}) \]
Karatsuba Multiplicaiton In 1 Karatsuba we see that this algorithm has 3 recursive calls instead of 4, so a = 3, but b is still 2 and d is still 1 because nothing there is changing. In this case, we are also in case 3 because 3 > \(2^1\) \[O(n^{\log_2{3}}) = O(n^{1.59})\]
Matrix multiplication In this algorithm found in 4 divide & conquer, this algorithm breaks the input matrices into four \(\frac{n}{2} by \frac{n}{2}\) matrices for each quadrant and performs 7 recursive calls on those smaller matrices so a = 7. Each of these calls is on a pair of these matrices where the size is divided by 2, so b = 2. The wrk done outside the calls involves \(n^2\) additions. So 7 >\(2^2\) which puts us in case 3. \[O(n^{\log_{2}{7}}) = O(n^{81})\]
Base case: \(T(1) \leq
c\), here \(n_0\) = 1
General case: \[T(n) \leq a
\cdot T\left(\frac{n}{b}\right) + cn^d\] ### Recursion tree
reminder From 2 mergesort you
should remember that there are \(a^j\)
subproblems at level jbecause the number of recursive calls
keeps multiplying by itself at each level, and at each level the
subproblems are operating on a subarray of length \(\frac{n}{b^j}\) because we are dividing the
array by the shrinkage factor b
at the level j we also
multiplies itself at the same rate as a
.
The recurrence in the general case above(on the far right) says that the amount of work done in each level-j subproblem is at most a constant multiplied by the input size raised to the power d or \(c(n/b^j)^d\).
This tells us how much work is done in one subproblem, to work out how much work is done in every subproblem in level-j we multiply that by the total number of subproblems(\(a^j\)).
\[Work\:at\:level\:j \leq a^j \cdot c\left[\frac{n}{b^j}\right]^d\] number of subproblems multiplied by work per subproblem where the bracket is the input size. We can simplify this by grouping the parts that depend on j together(basic algebra swapping out of a fraction) \[Work\:at\:level\:j \leq cn^d \cdot \left[\frac{a}{b^d}\right]^j\] The RHS bracket \(a/b^d\) introduces us to the critical ratio. This ratio is important to the relevant case of the master method.
If we sum up all the levels from level j = 0,1,2…, \(\log_b{n}\) we obtain the upper bound. \[total\:work \leq cn^d \cdot \sum_{ j=0 } ^ {\log_{b}{n}} \left[\frac{a}{b^d}\right]^j\]
So why is the critical ratio in the above formula so important? An
intuitive way to understand it is a ‘tug of war between good and evil’.
Evil is represented by a
, which is more formally known as
rate of subproblem proliferation (RSP). Which means
with every level of recursion, the number of subproblems explodes by a
factor of a
(which is a bad thing hence ‘evil’). And good is
represnted by b
. Which is formally known as the
rate of work shrinkage (RWS). This means that with
every level, the amount of work done per subproblem decreases by a
factor of b
(specifically \(b^d\)).
The key question is which side wins? There are exactly three possible cases: - RSP = RWS (a draw) (amount of work performed is always the same) - RSP < RWS (good wins) (amount of work performed is decreasing) - RSP > RWS (evil wins) (amount of worked performed is increasing)
How do we know which case it falls under? - if \(a = b^d\) there is a balance (draw) - if \(a < b^d\) RWS is faster than RSP (good wins) - if \(a > b^d\) RSP is faster than RWS (evil wins)
Now we know why the critical ratio \(\frac{a}{b^d}\) is important because it tells us which of the three cases above we are in. We also now know where those three formulas come from in #Theorem:
Case 1: RSP = RWS The amount of work done in
level 0 is the same as the amount of work done in every other
level, so at level j
we are obviously doing \(O(n^d)\) work. and there are \(O(\log{n})\) levels, so we have \(O(n^d \log{n})\) running time which is the
same as case 1 above.
Case 2: RSP < RWS The amount of work done is decreasing, so the most amount of work is performed at level 0 and then it gets lower and lower. And this is easy because big O notation takes the dominating running time. Therefore the running time in the best case is \(O(n^d)\).
Case 3: RSP > RWS Here the amount of work done is
increasing, so level 0 has the lowest running time and the
leaves of the tree has the highest running time. So in contrast to case
2, we know that the work done at the leaves dominates the running time
instead. We already know that there are \(a^{\log_{b}{n}}\) leaves from before. And
at the leaves specifically, which is the base case of the
algorithm, the base case only takes \(O(1)\) time, therefore the running time is
\(O(a^{\log_{b}{n}})\). The only caveat
is that in case 3 in #Theorem,
a
and n
are flipped. This is mathematically
correct in logarithms because: \[a^{\log_{b}{n}} = n^{\log_{b}{a}}\]
The RHS also makes inuitive sense as well, think back to merge sort
where a=2
and b=2
. This gives us \(n^{\log_2{2}}\) which is just \(n^1\). And this makes sense because the
number of recursive calls should keep exploding in merge sort until we
have every item in the original array as its own leaf, which is
equivalent to n
leaves.
Case 1 is handled by the fact that a constant amount of work is done at each level so a proof is not needed.
A geometric series is a sum of numbers where each term is found by multiplying the previous term by a fixed non-zero number called the common ratio (r):
When r = 2: 1 + 2 + 4 + 8 + … + \(2^k\) when r = \(\frac{1}{2}\): 1 + \(\frac{1}{2}\) + \(\frac{1}{4}\) + \(\frac{1}{8}\) … + \(\frac{1}{k}\) = \(\frac{1}{2^k}\)
For case 2 and case 3 where you have increasing and decreasing number
per level, you need to use a Geometric series(1 + r + r^2 + … + r^k) for
the proof. And the ratio r
for this geometric series is the
critical ratio mentioned earlier \(\frac{a}{b^d}\).
As a shortcut, when \(r \neq 1\), there is a useful closed-form formula for a geometric series: 1 + r + \(r^2\) + … + \(r^k\) = \[\frac{1 - r^{k+1}}{1 - r}\]
You can verify this formula by just multiplying both sides by (\(1 - r\)): \[(1-r)(1+r+r^2 + \dots + r^k) = 1 - r + r - r^2 + r^2 - r^3 + r^3 - \dots - r^{k+1}\] \[= 1 - r^{k+1}\]
Additionally, when r
is positive and less than
1: \[1+r+r^2+\dots+r^k \leq \frac{1}{1-r} =
a\;cons\tan t\]
This only makes sense if you understand how the infinite series works in Geometric series. The most important take away is that, every geometric series with \(r\in (0, 1)\) is dominated by its first term, which is 1, and the sum is only O(1). In the video for geometric series, the first square is always the largest square, so it always dominates in terms of running time. It doesn’t matter how many powers of 1/2 you add up, the resulting sum is never more than 2(Go back to the bouncing ball example).
Second when r > 1: \[1+r+r^2+\dots+r^k=\frac{r^{k+1}-1}{r-1} \leq \frac{r^{k+1}}{r-1} = r^k \cdot \frac{r}{r-1}\]
Alternatively, when r > 1, the series keeps getting larger and larger, so the series is dominated by its last term instead which is \(r^k\). To see the intuition and proof for why the above formula works, go back to Geometric series. This first and last term business is important because its directly associated to case 2 and 3 where either the first node or last nodes(leaf) dominates the running time.
In case 2 where \(a < b^d\) (good
wins). Set r = \(\frac{a}{b^d}\) the
critical ratio. Note that because a is less, the critical ratio will be
less than 1(decreases) and the series will be at most \(\frac{1}{1-r}\) as demonstrated above and
the bound becomes \[cn^d \cdot
\sum_{j=0}^{\log_{b}{n}}r^j = O(n^d)\] where r is the critical
ratio. So since we calculate that the summation(geometric series) above
is at most \(\frac{1}{1-r}\)
we have \[cn^d \cdot \frac{1}{1-r}\]
The constant factor \(\frac{c}{1-r}\)
is just a fraction of n
, therefore it is suppressed leaving
us with only \(O(n^d)\).
For \(a > b^d\) (evil wins) the number of subproblems exceeds the work shrinkage so the last term dominates the running time. This means the critical ration is greater than 1 so the bound from above becomes \[cn^d \cdot \sum_{j=1}^{\log_{b}{n}}r^j = O(n^d \cdot r^{\log_{b}{n}}) = O\left(n^d \cdot \left(\frac{a}{b^d}\right)^{\log_{b}{n}}\right)\]
Focus just on the RHS fraction, to make sense of this, we need to use
algebra: \[(b^{-d})^{\log_{b}{n}} =
b^{-d\log_{b}{n}} = (b^{\log_{b}{{n}}})^{-d} = (n^{\log_{b}{{b}}})^{-d}
= (n^1)^{-d} \] \[= n^{-d}\] So
now we have \(n^d\) and we have \(n^{-d}\) so if you swap out a
from the fraction with \(n^d\) they
should cancel out and you are left with \[O(a^{\log_{b}{n}})\] \[=O(n^{\log_{b}{a}})\] QED.
The running time of an algorithm is bounded by the non-standard recurrence with \(T(1) = 1\) and \[T(n) <= T(\lfloor \sqrt{n} \rfloor) + 1\] for n > 1. What is the correct upper bound on the asymptotic running time of the algorithm?
We can’t use the master method here because it is non standard. But
we can see that every recursive call reduces the problem down to the
square root of n
and we’re only doing +1
unit
of work on each step. This process of taking the square root repeatedly
until it reaches the base case 1
is similar to a
logarithm(dividing n
by b
until it reaches
1).
Taking the square root is like taking the logarithm twice. It’s basically saying, we want to take the logarithm, but we want it to execute it \(log_n\) times.
The goal of quicksort is to recursively order an array by partitioning it into left and right sections. Unlike mergesort, which divides the array in half, quicksort selects a pivot element. It then rearranges the other elements so that all elements less than the pivot are placed on its left-hand side (LHS), and all elements greater than the pivot are on its right-hand side (RHS). This process is then recursively applied to these left and right sub-arrays, which will typically have different lengths. This partitioning phase is efficient because it operates in place, requiring O(1) memory, and is completed in a single linear scan.
Similar to other sorting algorithms such as bubble sort and insertion
sort, quicksort aims to establish boundaries or partitions within the
array. In this case, we conceptually have two main portions: the
elements that have already been examined and those that have not.
Furthermore, among the examined elements, we are concerned with those
that are less than the pivot and those that are greater than the pivot.
To manage these boundaries, two pointers, conventionally named
i
and j
, are employed.
Within the main loop of the partitioning process, typically only the
right pointer (j
) increments with each iteration. The left
pointer (i
) serves to mark the next position where an
element smaller than the pivot should be placed. If the element at the
right pointer’s current position is larger than the pivot, no immediate
action is taken; the element remains in its current location, as it
belongs on the RHS. However, if an element encountered by the right
pointer is less than the pivot, it is swapped with the element at the
left pointer’s position, and then the left pointer is incremented to
prepare for the next potential swap of a smaller element. Once the right
pointer has traversed the entire array, the pivot element is finally
swapped with the element at the left pointer’s position, effectively
placing the pivot in its sorted position within the partition.
quickSort (int[] array , int left , int right) {
function if (left < right) {
int pivot = array [right];
int insertIndex = left;
for (int l = left; l <= right; l++) {
if ( array [l] <= pivot) {
swap(l , insertIndex ++);
}
}
quickSort(array, left, insertIndex - 2);
quickSort(array, insertIndex, right);
}
}
Always choosing the first or last element as the pivot is bad when
the array is already sorted. This is because on each iteration, you are
doing the same amount of work again but on n-1
elements,
then n-2
elements, so you have n + (n-1) + (n-2) + … + 1
which is \(O(n^2)\). Choosing a good
pivot is about having roughly the same amount of elements on the left
and right of the pivot so that it is balanced, this is
the median element. So you can have a
chosePibot function that takes an input array and
returns the median element. And the median can be calculated in linear
time(7 Linear-time
selection). If we calculate this median, then quicksort runs in
\(O(nlogn)\) time which should be
obvious since a pivot at the medium means the array should be split in
half similar to mergesort. The running time is exactly: \[T(n) \leq 2\cdot T\left(\frac{n}{2}
\right)+O(n)\] Where it is \(\frac{n}{2}\) because the pivot is the
median and the array should theoretically be split in half and we
recurse onto those two halfs. It is still + \(O(n)\) in the end since the choosepivot
function and partition function run in linear time. So a = b = 2 and d =
1 which gives \(O(nlogn)\).
Selecting the first or last element of an array takes O(1) times whereas calculated the median takes O(n) time. We can instead get the best of both worlds by using randomized quicksort. This just simply returns a random element in the array as the pivot, alternatively you can shuffle the array and then reuse the same algorithm as described above.
In this case its very unlikely that it will choose the smallest or largest element in the array in terms of probability. So the average case scenario will lead to an average running time of \(O(nlogn)\).
WIP…