Intro AI

The minimax algorithm is a recursive tree searching algorithm for making decisions in game theory. The core of the algorithm is to return an evaluation of the best move to make, it determines what is the best by evaluating the quality of a number, this is usually a number. The larger the number, the better the move.

The algorithm fundamentally does two things: 1. Maximising: From the perspective of the AI (or the player trying to win), the algorithm evaluates every possible move and returns the maximum evaluation since we want the AI to win. 2. Minimising: From the perspective of the opponent(who we want to beat), the algorithm similarly evaluates every possible move but instead returns the minimum.

This is done by a function \(E(s)\) for evaluation which takes the entire game state at the current moment as input. While the Minimax algorithm is generally the same for all use cases, the Evaluation is heavily dependent on the game itself. For example, determining whether a move is a good move means completely different things if we are playing Checkers and Chess since they have their own rules. An evaluation can either be static or dynamic. A static evaluation is when the rules are hardcoded into the function and is dependent on the user to specify what they think is a good or bad move. Alternatively a dynamic evaluation uses further searching techniques like Genetic Algorithms to dynamically determine good or bad moves without explicitly being told what they are.

A game tree in Minimax looks something like this where each level in the tree corresponds to whos turn it is. The root node of the tree is picking the minimum value of its children, indicating that this is the human player since we want them to lose. Likewise, the next layer is choosing the maximum of its children, this is obviously the AI that is trying to win the game. Another way to think of this is, “what is the best move we can pick that forces the opponent to choose the worst move”.

Algorithm

Just like above, we want to start at the root of the game tree and recursively expand out.

base case: if the current node is a terminal node(the game is finished or max depth was reached), return the evaluation for this node.

Evaluation If the current node is maximising (AI turn): \[value = max(Minimax(child)\:for\:each\:child\:of\:the\:node)\] And if the current node is minimising (opponents turn): \[value = min(Minimax(child)\:for\:each\:child\:of\:the\:node)\]

Backtracking This step is simple, it simply returns the evaluated value up the tree through the call stack until the node is reached.

Code

def minimax(node, depth, maximizingPlayer):
    if depth == 0 or node is a terminal node:
        return evaluate(node)

    if maximizingPlayer:
        maxEval = -infinity
        for each child of node:
            eval = minimax(child, depth - 1, False)
            maxEval = max(maxEval, eval)
        return maxEval
    else:
        minEval = infinity
        for each child of node:
            eval = minimax(child, depth - 1, True)
            minEval = min(minEval, eval)
        return minEval

The function minimax takes three parameters: the current game state node, the remaining search depth, and a boolean maximisingPlayer indicating whether the current player is maximising or minimising the evaluation. If the depth is zero or the node is a terminal state, the function returns the evaluation of the node. For the maximising player, the function iterates over each child node, recursively calls minimax with reduced depth and the opponent’s turn, and selects the maximum evaluation among the children. Conversely, for the minimising player, it selects the minimum evaluation. This process continues recursively until all relevant nodes are evaluated, ensuring that the algorithm chooses the best possible move for the AI.

Alpha-Beta pruning

The problem with the above algorithm is that it’s very inneficient. It’s not immediately obvious that it’s inneficient at depth=1, and even depth=2, but as soon you reach depth=3 there is an explosion of recursive calls:

From the above figure we can see that the Alpha-Beta approach prunes a drastically significant number of recursive calls. This becomes extremely problematic since Minimax at depth=1 is not very advanced. Additionally depth=2 is not helpful either, in fact it is worse. This is because even depths favor the human player whereas odd depths favor the AI, you can find more about this here. Therefore, Minimax at depth=3 is crucial for this algorithm to work and makes Alpha-Beta pruning a necessary function.

Alpha-Beta pruning is an optimisation over the above Minimax algorithm, it works by eliminating branches in the game tree that are redundant, meaning we can tell beforehand that this branch doesn’t affect the final result.

Intuition

If you are playing a game and considering all possible moves. As you evaluate each move, you keep track of the best option you’ve found so far (alpha for the maximising player and beta for the minimising player). If, at any point, you determine that a move will not improve upon the best option already found, you can safely ignore (or “prune”) all further possibilities stemming from that move.

For example, look at the RHS of the second level of the tree \(\geq8\). We can see that one of its children has not been evaluated yet, the important part is that it doesn’t need to be evaluated either. This is because, once the left node is already explored, this new information tells us that the parent node can only be 8 or greater than 8 since we are maximising. Now in context of the root node which is minimising, if the root node had to choose between 5 and 8, it will always pick 5 anyways regardless of what the right most branch reveals in the end. This realisation is whats called Alpha-Beta pruning and essentially prunes or cuts off exploring that branch where the unexplored value doesn’t make a difference to the final result.

Code

    if maximizingPlayer:
        maxEval = -infinity
        for each child of node:
            eval = minimax(child, depth - 1, alpha, beta, False)
            maxEval = max(maxEval, eval)
            alpha = max(alpha, eval)
            if beta <= alpha:
                break  # Beta cut-off
        return maxEval
    else:
        minEval = infinity
        for each child of node:
            eval = minimax(child, depth - 1, alpha, beta, True)
            minEval = min(minEval, eval)
            beta = min(beta, eval)
            if beta <= alpha:
                break  # Alpha cut-off
        return minEval

Integrating Alpha-Beta into Minimax is simply a matter of adding 3 extra lines of code to both the maximising and minimising sections. Essentially, we update maxEval as usual with the new evaluation eval, but we also update an extra variable alpha with the same evaluation for maximising. Therefore, if beta is less than or equal to alpha, it means the minimising player has a better option elsewhere, so further exporation of this branch is unecessary (as explained above), and so the break statement breaks out of the loop which iterates through each child node, since exploring any further child nodes is pointless. This is called a Beta cut-off. Likewise the same thing is done in reverse for the minimising section in the else statement.

Further optimisations

Iterative deepening

def iterative_deepening(node, max_depth, time_limit):
    best_move = None
    start_time = current_time()

    for depth in range(1, max_depth + 1):
        if current_time() - start_time > time_limit:
            break  # time limit exceeded

        _, best_move = minimax(node, depth, -infinity, infinity, True, start_time, time_limit)

    return best_move

The above code is pretty self explanatory, essentially Iterative deepening combines concepts from BFS and DFS to ensure that the algorithm can find the best move within a given time limit, even if it doesn’t have enough time to search the entire game tree to a fixed depth. Additionally, increasing the depths incrementally allows the algorithm to decide whether it should stop early if it is making a lot of the same choices.

    if depth == 0 or node is a terminal node or current_time() - start_time > time_limit:
        return evaluate(node), node

The time variables are also passed into Minimax, meaning we can specify in the base case whether we have reached the time limit. This lets us exit out of the game tree early returning the best score so far.

Monte Carlo Tree Search

MCTS is a random sampling algorithm that incrementally constructs a partial tree of all the best possible moves to make from a current node state. This type of tree search is different to minimax because it does not exhaustively traverse through every permutation of moves. Rather, it iterates sequentially through four stages of move selection. These stages are repeated iteratively until a maximum iteration is reached. Likewise, in estimating the value of Pi, the higher the number of iterations, the more optimal solution is found.

Monte Carlo tree search is divided into four stages. The first stage is finding the leaf node with the maximum upper confidence bound. This is a value that guides the selection of nodes from the root node, essentially it is doing a best-first search traversal until it reaches the leaf node, where the order of traversal is determined by the UCT formula.

Formula

This formula consists of two parts, the exploitation term: \[\frac{Q(v)}{N(v)}\] Which represents the average reward of the current node, it encourages the selection of nodes that have high rewards, also known as exploitation.

And the exploration term: \[C \cdot \sqrt{\frac{\ln{N(p)}}{N(v)}}\] This term encourages the exploration of less-visited nodes. A square root of the logarithm of the parents visit count is divided by the current nodes visit count. This ensures that nodes with fewer visits are given a higher exploration. The constant C can be chosen arbitrarily where as a larger constant favors more exploration.

def uct(node, exploration_param=1.4):
    if node.visits == 0:
        return float('inf')  # prioritise unvisited nodes
    return node.value / node.visits + exploration_param * math.sqrt(math.log(node.parent.visits) / node.visits)

This formula provides a trade-off or balance between exploration and exploitation allowing the algorithm to balance between choosing nodes that are more promising and nodes that haven’t been explored as much. It does this by splitting the calculation into two halves where exploitation is the win ratio on the left-hand side, or the number of wins divided by the number of visits. Exploration is handled on the right-hand side which encourages the selection of nodes that have been visited less.

The Four stages

def select(node):
    while node.children:
        node = max(node.children, key=lambda child: uct(child))
    return node

def expand(node):
    # Generate all possible child states from current node
    child_states = get_possible_states(node.state)
    for state in child_states:
        node.children.append(Node(state, parent=node))

def simulate(state):
    # Perform a random simulation from the given state
    while not is_terminal(state):
        state = random_playout(state)
    return get_reward(state)

def backpropagate(node, reward):
    while node is not None:
        node.visits += 1
        node.value += reward
        node = node.parent

Once a leaf node is selected, Monte Carlo estimation is utilised by using pseudo randomness to generate new child nodes. The squiggly line in the above figure represents the simulation phase on this expanded node. The algorithm plays a complete game on this decision and evaluates its playthrough. If the choice leads to a victory, then it has a high evaluation score. This score is valuable in the fourth stage which is backpropagation. As can be seen in the figure, the values are updated upwards from the leaf node where the number of visits is incremented, and the evaluation score is accumulated up to the root node. This entire process is repeated for a large number of iterations until an optimal solution is found.

Hill climbing

def hill_climbing(initial_state, objective_function, generate_neighbors):
  current_state = initial_state # root child
  while True:
    neighbors = generate_neighbors(current_state) # generate n children
    better_neighbor = None
    for neighbor in neighbors: # iterate each children
      neighbor_score = objective_function(neighbor) # evaluate child
      current_score = objective_function(current_state) # evaluate root child
      if neighbor_score > current_score: 
        better_neighbor = neighbor
        break

    if better_neighbor is None:
      return current_state

    current_state = better_neighbor # current_state = best child

The code for hill climbing is pretty self explantory and is the basis of the searching algorithms that will be explored here. It begins at an initial location, represented by initial_state. This starting point is our first guess at the optimal solution.

In each step of this process, the algorithm examines the locations immediately surrounding its current position. This is achieved by the generate_neighbors function, which takes the current_state and produces a set of nearby states. Think of these neighbors as the spots you can reach with a single step from where you are. The algorithm then evaluates each of these neighboring locations using an objective_function. This function acts like your ability to sense the elevation of each spot, it assigns a score to each location, with a higher score indicating a better option.

The algorithm’s core decision-making process lies in comparing the scores of these neighboring locations with the score of the current_state. If any of the neighbors have a better score (are higher) than the current location, the algorithm selects the best among them and moves to that location, updating the current_state. This step mirrors you moving to a higher spot you’ve discovered. This process repeats, with the algorithm continuously exploring its surroundings and moving to better locations. However, if, after examining all the neighboring locations, none are found to be better than the current one, it signifies that the algorithm has reached a peak – at least locally. In this scenario, where no immediate improvement is possible, the algorithm concludes its search and returns the current_state as the best solution it could find.

Simulated annealing

Intro

Simualted annealing is an extension of the Hill Climbing algorithm, combined with the concepts of metallurgy. This algorithm can be better understood by the concept of blacksmithing:

In blacksmithing you want the metal to be as strong and flexible as possible. Which requires a combination of high temperature and low temperature.

And so in order to balance exploration and exploitation, we need to balance both high and low temperatures. In this case, the ‘temperature’ is just a virtual temperature in the code which is either increases or decreases.

Maximising

probability of accepting a state with a better object function is always 1(always go up)

If: \[\Delta E = E_{next} - E_{current} < 0\] probability of accepting a state with a worse object function is \[P(Accept\:next) = e^{\Delta E/T}\] T - temperature at time step

Minimising

If: \[\Delta E = E_{current} - E_{next} > 0 \] probability of accepting a state with a better object function is always 1

If: \[\Delta E = E_{current} - E_{next} < 0\] probability of accepting a state with a worse object function is \[P(Accept\:next) = e^{\Delta E/T}\] ## Code

def simulated_annealing(state):
    for t = 1 to inf:
        # we initially make large changes to T
        # but gradually reduce it
        T = schedule(t) # change in energy proportional to T
        if T = 0: # T decreases over time
            return state
        candidate = random_neighbour(state)
        E = eval(candidate) - eval(state)
        if E > 0:
            state = candidate
        else: # not a good move
            # If next is worse, determine if 
            # we should take this worse step
            # because sometimes you need to make
            # worse moves to break out of local maxima
            prob = probability(E, T) # math.e**(E, T)
            if random() < prob:
                state = candidate

Tweaks

Initial temperature and cooling rate

The effectiveness of simulated annealing is sensitive to the initial temperature and cooling rate. If the initial temperature is too low or the cooling rate is too fast, the algorithm might not explore the solution space adequately and converge prematurely to a suboptimal solution. Conversely, if the initial temperature is too high or the cooling rate is too slow, the algorithm might take a very long time to converge.

The “temperature” controls the probability of accepting worse solutions, this value decreases over time or over the course of the algorithm since we want to start with high exploration and then finally settle when the time is right.

Genetic Algorithms

Car Example

The genome consists of parts of the car, the shape, wheel size, wheel position, wheel density etc, in which each has a set of genes that can be changed

GAs begin with a set of k randomly generated states, or a population(of cars). And each individual state contains a chromosome. Each chromosome contains a set of genes. Each gene is a numerical value which controls a thing

Fitness

Selection of which individual state is allowed to produce the next generation is based on fitness. Fitness is a measure of how well an individual solves the problem, such as the distance travelled in the car design example. Individuals with higher fitness values are more likely to be selected for reproduction. Various selection methods, like roulette wheel or tournament selection, use these fitness values to determine the probability of an individual being chosen. Multiple individuals can be selected based on their fitness, ensuring diversity and preventing premature convergence.

Selection

Selection in genetic algorithms involves choosing individuals to produce the next generation, with fitter individuals (those with higher fitness scores) being more likely to pass on their genes. This process improves the overall performance of the population over successive generations. Several selection methods can be used, each with its own approach to favoring fitter individuals:

Reproduction

Offspring for the next generation are created using a crossover point to determine which genes are carried over from which parent. For example, if the crossover point is 4, the first 4 genes of the offspring will come from one parent, and the remaining genes will come from the other parent. The crossover point can be randomly generated, introducing variability and diversity in the offspring. This process helps explore different combinations of genes, potentially leading to better solutions.

Mutation

Mutation introduces random changes to the genetic material of offspring, adding diversity to the population and helping to explore new solutions. If a gene mutation occurs at a specific index (e.g., index 7), the gene at that position is altered. The decision to mutate can be random, with a predefined mutation rate determining the likelihood of a mutation. The gene can be replaced with a random value, set to the average value of the corresponding genes from both parents, or altered using other strategies depending on the problem.

Elites

Elites are a mechanism to preserve the best-performing individuals from one generation to the next, ensuring that high-quality solutions are not lost due to crossover and mutation. An elite child is a direct copy of the best-performing chromosome from the previous generation. Instead of undergoing crossover and mutation, the elite chromosome is passed unchanged to the next generation. This ensures that the best solutions are retained, providing a baseline of high performance for the next generation and helping maintain or improve the overall fitness of the population over generations.

Code

def genetic_algorithm(state, size):
    for t = 1 to inf:
        # threshold arbitrarily chosen, like reaching the end
        # in the car example, or goal state found(good enough)
        if t > threshold or individual fit enough:
            return best_individual
        new_population = empty set
        for i = 1 to size:
            # selection type can be top performer or random
            x = selection(population, SELECTION_TYPE)
            y = selection(population, SELECTION_TYPE)
            # random crossover point
            child = reproduce(x, y)
            # mutation rate you would choose arbitrarily
            # no rate is best, 0.05 for example
            if random() > mutation_rate:
                child = mutate(child)
            new_population.add(child) # add child to population
        population = new_population

The function genetic_algorithm takes an initial state and a size parameter, which determines the population size. The algorithm iterates continuously for a specified number of generations or until a stopping condition is met, such as exceeding a predefined threshold or finding an individual that is fit enough. Within each iteration, a new population is generated. The process involves selecting two parents (x and y) from the current population using a specified selection method (SELECTION_TYPE), which could be based on fitness or random selection. These parents undergo a reproduction process with a random crossover point to create a child. The child may then undergo mutation, depending on a randomly determined probability compared to a predefined mutation rate. If the random value exceeds the mutation rate, the child is mutated. Finally, the child is added to the new population. Once all children for the current generation are created, the new population replaces the old one, and the algorithm continues to the next iteration. This cycle of selection, reproduction, mutation, and replacement aims to evolve the population towards an optimal solution over successive generations.

Minimax

Algorithm

Code

Alpha-Beta pruning

Intuition

Code

Further optimisations

Iterative deepening

Monte Carlo Tree Search

Formula

The Four stages

Hill climbing

Simulated annealing

Intro

Maximising

Minimising

Tweaks

Initial temperature and cooling rate

Genetic Algorithms

Car Example

Fitness

Selection

Reproduction

Mutation

Elites

Code

Stochastic Gradient Descent