The minimax algorithm is a recursive tree searching algorithm for making decisions in game theory. The core of the algorithm is to return an evaluation of the best move to make, it determines what is the best by evaluating the quality of a number, this is usually a number. The larger the number, the better the move.
The algorithm fundamentally does two things: 1. Maximising: From the perspective of the AI (or the player trying to win), the algorithm evaluates every possible move and returns the maximum evaluation since we want the AI to win. 2. Minimising: From the perspective of the opponent(who we want to beat), the algorithm similarly evaluates every possible move but instead returns the minimum.
This is done by a function \(E(s)\) for evaluation which takes the entire game state at the current moment as input. While the Minimax algorithm is generally the same for all use cases, the Evaluation is heavily dependent on the game itself. For example, determining whether a move is a good move means completely different things if we are playing Checkers and Chess since they have their own rules. An evaluation can either be static or dynamic. A static evaluation is when the rules are hardcoded into the function and is dependent on the user to specify what they think is a good or bad move. Alternatively a dynamic evaluation uses further searching techniques like Genetic Algorithms to dynamically determine good or bad moves without explicitly being told what they are.
A game tree in Minimax looks something like this where each level in the tree corresponds to whos turn it is. The root node of the tree is picking the minimum value of its children, indicating that this is the human player since we want them to lose. Likewise, the next layer is choosing the maximum of its children, this is obviously the AI that is trying to win the game. Another way to think of this is, “what is the best move we can pick that forces the opponent to choose the worst move”.
Just like above, we want to start at the root of the game tree and recursively expand out.
base case: if the current node is a terminal node(the game is finished or max depth was reached), return the evaluation for this node.
Evaluation If the current node is maximising (AI turn): \[value = max(Minimax(child)\:for\:each\:child\:of\:the\:node)\] And if the current node is minimising (opponents turn): \[value = min(Minimax(child)\:for\:each\:child\:of\:the\:node)\]
Backtracking This step is simple, it simply returns the evaluated value up the tree through the call stack until the node is reached.
def minimax(node, depth, maximizingPlayer):
if depth == 0 or node is a terminal node:
return evaluate(node)
if maximizingPlayer:
= -infinity
maxEval for each child of node:
eval = minimax(child, depth - 1, False)
= max(maxEval, eval)
maxEval return maxEval
else:
= infinity
minEval for each child of node:
eval = minimax(child, depth - 1, True)
= min(minEval, eval)
minEval return minEval
The function minimax
takes three parameters: the current
game state node
, the remaining search depth
,
and a boolean maximisingPlayer
indicating whether the
current player is maximising or minimising the evaluation. If the depth
is zero or the node is a terminal state, the function returns the
evaluation of the node. For the maximising player, the function iterates
over each child node, recursively calls minimax
with
reduced depth and the opponent’s turn, and selects the maximum
evaluation among the children. Conversely, for the minimising player, it
selects the minimum evaluation. This process continues recursively until
all relevant nodes are evaluated, ensuring that the algorithm chooses
the best possible move for the AI.
The problem with the above algorithm is that it’s very inneficient. It’s
not immediately obvious that it’s inneficient at depth=1
,
and even depth=2
, but as soon you reach
depth=3
there is an explosion of recursive calls:
From the above figure we can see that the Alpha-Beta approach
prunes a drastically significant number of recursive calls.
This becomes extremely problematic since Minimax at depth=1
is not very advanced. Additionally depth=2
is not helpful
either, in fact it is
worse. This is because even depths favor the human
player whereas odd depths favor the AI, you can find more about
this here.
Therefore, Minimax at depth=3
is crucial for this algorithm
to work and makes Alpha-Beta pruning a necessary function.
Alpha-Beta pruning is an optimisation over the above Minimax algorithm, it works by eliminating branches in the game tree that are redundant, meaning we can tell beforehand that this branch doesn’t affect the final result.
If you are playing a game and considering all possible moves. As you evaluate each move, you keep track of the best option you’ve found so far (alpha for the maximising player and beta for the minimising player). If, at any point, you determine that a move will not improve upon the best option already found, you can safely ignore (or “prune”) all further possibilities stemming from that move.
For example, look at the RHS of the second level of the tree \(\geq8\). We can see that one of its
children has not been evaluated yet, the important part is that it doesn’t need to be
evaluated either. This is because, once the left node is already explored, this
new information tells us that the parent node can only be 8
or greater than 8
since we are
maximising. Now in context of the root node which is
minimising, if the root node had to choose between
5
and 8
, it will always pick
5
anyways regardless of what the right most branch reveals
in the end. This realisation is whats called Alpha-Beta pruning and
essentially prunes or cuts off exploring that branch where the
unexplored value doesn’t make a difference to the final result.
if maximizingPlayer:
= -infinity
maxEval for each child of node:
eval = minimax(child, depth - 1, alpha, beta, False)
= max(maxEval, eval)
maxEval = max(alpha, eval)
alpha if beta <= alpha:
break # Beta cut-off
return maxEval
else:
= infinity
minEval for each child of node:
eval = minimax(child, depth - 1, alpha, beta, True)
= min(minEval, eval)
minEval = min(beta, eval)
beta if beta <= alpha:
break # Alpha cut-off
return minEval
Integrating Alpha-Beta into Minimax is simply a matter of adding 3
extra lines of code to both the maximising and minimising sections.
Essentially, we update maxEval
as usual with the new
evaluation eval
, but we also update an extra variable
alpha
with the same evaluation for maximising. Therefore,
if beta
is less than or equal to alpha, it means the
minimising player has a better option elsewhere, so further exporation
of this branch is unecessary (as explained above), and so the
break
statement breaks out of the loop which iterates
through each child node, since exploring any further child nodes is
pointless. This is called a Beta cut-off. Likewise the
same thing is done in reverse for the minimising section in the else
statement.
def iterative_deepening(node, max_depth, time_limit):
= None
best_move = current_time()
start_time
for depth in range(1, max_depth + 1):
if current_time() - start_time > time_limit:
break # time limit exceeded
= minimax(node, depth, -infinity, infinity, True, start_time, time_limit)
_, best_move
return best_move
The above code is pretty self explanatory, essentially Iterative deepening combines concepts from BFS and DFS to ensure that the algorithm can find the best move within a given time limit, even if it doesn’t have enough time to search the entire game tree to a fixed depth. Additionally, increasing the depths incrementally allows the algorithm to decide whether it should stop early if it is making a lot of the same choices.
if depth == 0 or node is a terminal node or current_time() - start_time > time_limit:
return evaluate(node), node
The time variables are also passed into Minimax, meaning we can specify in the base case whether we have reached the time limit. This lets us exit out of the game tree early returning the best score so far.
MCTS is a random sampling algorithm that incrementally constructs a partial tree of all the best possible moves to make from a current node state. This type of tree search is different to minimax because it does not exhaustively traverse through every permutation of moves. Rather, it iterates sequentially through four stages of move selection. These stages are repeated iteratively until a maximum iteration is reached. Likewise, in estimating the value of Pi, the higher the number of iterations, the more optimal solution is found.
Monte Carlo tree search is divided into four stages. The first stage is finding the leaf node with the maximum upper confidence bound. This is a value that guides the selection of nodes from the root node, essentially it is doing a best-first search traversal until it reaches the leaf node, where the order of traversal is determined by the UCT formula.
\[UCT(v) = \frac{Q(v)}{N(v)} + C \cdot \sqrt{\frac{\ln{N(p)}}{N(v)}}\]
This formula consists of two parts, the exploitation term: \[\frac{Q(v)}{N(v)}\] Which represents the average reward of the current node, it encourages the selection of nodes that have high rewards, also known as exploitation.
And the exploration term: \[C \cdot \sqrt{\frac{\ln{N(p)}}{N(v)}}\] This term encourages the exploration of less-visited nodes. A square root of the logarithm of the parents visit count is divided by the current nodes visit count. This ensures that nodes with fewer visits are given a higher exploration. The constant C can be chosen arbitrarily where as a larger constant favors more exploration.
def uct(node, exploration_param=1.4):
if node.visits == 0:
return float('inf') # prioritise unvisited nodes
return node.value / node.visits + exploration_param * math.sqrt(math.log(node.parent.visits) / node.visits)
This formula provides a trade-off or balance between exploration and exploitation allowing the algorithm to balance between choosing nodes that are more promising and nodes that haven’t been explored as much. It does this by splitting the calculation into two halves where exploitation is the win ratio on the left-hand side, or the number of wins divided by the number of visits. Exploration is handled on the right-hand side which encourages the selection of nodes that have been visited less.
def select(node):
while node.children:
= max(node.children, key=lambda child: uct(child))
node return node
def expand(node):
# Generate all possible child states from current node
= get_possible_states(node.state)
child_states for state in child_states:
=node))
node.children.append(Node(state, parent
def simulate(state):
# Perform a random simulation from the given state
while not is_terminal(state):
= random_playout(state)
state return get_reward(state)
def backpropagate(node, reward):
while node is not None:
+= 1
node.visits += reward
node.value = node.parent node
Once a leaf node is selected, Monte Carlo estimation is utilised by using pseudo randomness to generate new child nodes. The squiggly line in the above figure represents the simulation phase on this expanded node. The algorithm plays a complete game on this decision and evaluates its playthrough. If the choice leads to a victory, then it has a high evaluation score. This score is valuable in the fourth stage which is backpropagation. As can be seen in the figure, the values are updated upwards from the leaf node where the number of visits is incremented, and the evaluation score is accumulated up to the root node. This entire process is repeated for a large number of iterations until an optimal solution is found.
An algorithm that only moves in the direction of improving value
Generate
n
children and calculate each of their fitness select the best childif fitness = maximize yield
.
Example:
def hill_climbing(initial_state, objective_function, generate_neighbors):
= initial_state # root child
current_state while True:
= generate_neighbors(current_state) # generate n children
neighbors = None
better_neighbor for neighbor in neighbors: # iterate each children
= objective_function(neighbor) # evaluate child
neighbor_score = objective_function(current_state) # evaluate root child
current_score if neighbor_score > current_score:
= neighbor
better_neighbor break
if better_neighbor is None:
return current_state
= better_neighbor # current_state = best child current_state
The code for hill climbing is pretty self explantory and is the basis
of the searching algorithms that will be explored here. It begins at an
initial location, represented by initial_state
. This
starting point is our first guess at the optimal solution.
In each step of this process, the algorithm examines the locations
immediately surrounding its current position. This is achieved by the
generate_neighbors
function, which takes the
current_state
and produces a set of nearby states. Think of
these neighbors as the spots you can reach with a single step from where
you are. The algorithm then evaluates each of these neighboring
locations using an objective_function
. This function acts
like your ability to sense the elevation of each spot, it assigns a score to each
location, with a higher score indicating a better option.
The algorithm’s core decision-making process lies in comparing the
scores of these neighboring locations with the score of the
current_state
. If any of the neighbors have a better score
(are higher) than the current location, the algorithm selects the best
among them and moves to that location, updating the
current_state
. This step mirrors you moving to a higher
spot you’ve discovered. This process repeats, with the algorithm
continuously exploring its surroundings and moving to better locations.
However, if, after
examining all the neighboring locations, none are found to be better
than the current one, it signifies that the algorithm has reached a peak
– at least locally. In this scenario, where no immediate
improvement is possible, the algorithm concludes its search and returns
the current_state
as the best solution it could find.
Simualted annealing is an extension of the Hill Climbing algorithm, combined with the concepts of metallurgy. This algorithm can be better understood by the concept of blacksmithing:
In blacksmithing you want the metal to be as strong and flexible as possible. Which requires a combination of high temperature and low temperature.
And so in order to balance exploration and exploitation, we need to balance both high and low temperatures. In this case, the ‘temperature’ is just a virtual temperature in the code which is either increases or decreases.
Let E denote the objective function value (E: energy)
If: \[\Delta E = E_{next} - E_{current} > 0 \]
probability of accepting a state with a better object function is always 1(always go up)
If: \[\Delta E = E_{next} - E_{current} < 0\] probability of accepting a state with a worse object function is \[P(Accept\:next) = e^{\Delta E/T}\] T - temperature at time step
If: \[\Delta E = E_{current} - E_{next} > 0 \] probability of accepting a state with a better object function is always 1
If: \[\Delta E = E_{current} - E_{next} < 0\] probability of accepting a state with a worse object function is \[P(Accept\:next) = e^{\Delta E/T}\] ## Code
def simulated_annealing(state):
for t = 1 to inf:
# we initially make large changes to T
# but gradually reduce it
= schedule(t) # change in energy proportional to T
T if T = 0: # T decreases over time
return state
= random_neighbour(state)
candidate = eval(candidate) - eval(state)
E if E > 0:
= candidate
state else: # not a good move
# If next is worse, determine if
# we should take this worse step
# because sometimes you need to make
# worse moves to break out of local maxima
= probability(E, T) # math.e**(E, T)
prob if random() < prob:
= candidate state
The effectiveness of simulated annealing is sensitive to the initial temperature and cooling rate. If the initial temperature is too low or the cooling rate is too fast, the algorithm might not explore the solution space adequately and converge prematurely to a suboptimal solution. Conversely, if the initial temperature is too high or the cooling rate is too slow, the algorithm might take a very long time to converge.
The “temperature” controls the probability of accepting worse solutions, this value decreases over time or over the course of the algorithm since we want to start with high exploration and then finally settle when the time is right.
Inspired by evolution, where successor states are generated by the combining of successful(good genes) parent states. This is a variant to local beam search, where k states are generated, tracked, and the kbest state is chosen to generate new states from.
The genome consists of parts of the car, the shape, wheel size, wheel position, wheel density etc, in which each has a set of genes that can be changed
GAs begin with a set of k randomly generated states, or a population(of cars). And each individual state contains a chromosome. Each chromosome contains a set of genes. Each gene is a numerical value which controls a thing
Selection of which individual state is allowed to produce the next generation is based on fitness. Fitness is a measure of how well an individual solves the problem, such as the distance travelled in the car design example. Individuals with higher fitness values are more likely to be selected for reproduction. Various selection methods, like roulette wheel or tournament selection, use these fitness values to determine the probability of an individual being chosen. Multiple individuals can be selected based on their fitness, ensuring diversity and preventing premature convergence.
Selection in genetic algorithms involves choosing individuals to produce the next generation, with fitter individuals (those with higher fitness scores) being more likely to pass on their genes. This process improves the overall performance of the population over successive generations. Several selection methods can be used, each with its own approach to favoring fitter individuals:
Offspring for the next generation are created using a crossover point to determine which genes are carried over from which parent. For example, if the crossover point is 4, the first 4 genes of the offspring will come from one parent, and the remaining genes will come from the other parent. The crossover point can be randomly generated, introducing variability and diversity in the offspring. This process helps explore different combinations of genes, potentially leading to better solutions.
Mutation introduces random changes to the genetic material of offspring, adding diversity to the population and helping to explore new solutions. If a gene mutation occurs at a specific index (e.g., index 7), the gene at that position is altered. The decision to mutate can be random, with a predefined mutation rate determining the likelihood of a mutation. The gene can be replaced with a random value, set to the average value of the corresponding genes from both parents, or altered using other strategies depending on the problem.
Elites are a mechanism to preserve the best-performing individuals from one generation to the next, ensuring that high-quality solutions are not lost due to crossover and mutation. An elite child is a direct copy of the best-performing chromosome from the previous generation. Instead of undergoing crossover and mutation, the elite chromosome is passed unchanged to the next generation. This ensures that the best solutions are retained, providing a baseline of high performance for the next generation and helping maintain or improve the overall fitness of the population over generations.
def genetic_algorithm(state, size):
for t = 1 to inf:
# threshold arbitrarily chosen, like reaching the end
# in the car example, or goal state found(good enough)
if t > threshold or individual fit enough:
return best_individual
= empty set
new_population for i = 1 to size:
# selection type can be top performer or random
= selection(population, SELECTION_TYPE)
x = selection(population, SELECTION_TYPE)
y # random crossover point
= reproduce(x, y)
child # mutation rate you would choose arbitrarily
# no rate is best, 0.05 for example
if random() > mutation_rate:
= mutate(child)
child # add child to population
new_population.add(child) = new_population population
The function genetic_algorithm
takes an initial
state
and a size
parameter, which determines
the population size. The algorithm iterates continuously for a specified
number of generations or until a stopping condition is met, such as
exceeding a predefined threshold or finding an individual that is fit
enough. Within each iteration, a new population is generated. The
process involves selecting two parents (x
and
y
) from the current population using a specified selection
method (SELECTION_TYPE
), which could be based on fitness or
random selection. These parents undergo a reproduction process with a
random crossover point to create a child. The child may then undergo
mutation, depending on a randomly determined probability compared to a
predefined mutation rate. If the random value exceeds the mutation rate,
the child is mutated. Finally, the child is added to the new population.
Once all children for the current generation are created, the new
population replaces the old one, and the algorithm continues to the next
iteration. This cycle of selection, reproduction, mutation, and
replacement aims to evolve the population towards an optimal solution
over successive generations.