# Assignment 01 # Assignment 02 # why games? simple to reason about must consider an adversary's moves R+N: > Games, like the real world [...] require the ability to make some decision even when calculating the optimal decision is infeasible. # Notable examples Chess: Deep Blue (developed by Murray Campbell @IBM, defeated Kasparov) Checkers: Chinook (dev Jonathon Schaeffer @UAlberta, Marion Tinsley) Checkers is solved! (draw) # Today search in adversarial environments - key concepts: - game tree - min and max players - minimax value - negamax - methods for searching game tree - alpha/beta pruning - approximate evaluation functions - games with chance elements # AI Jeopardy This data structure is defined by the initial game state and the legal moves for each player - game tree This is the value of a node for a given player, assuming that both players play optimally to the end of the game - minimax value This is a level of the search tree defined by a move by a single player - ply # a game tree tic-tac-toe example on the board, show: - min vs max plies - high b - terminal state utility (for this game, -1, 0, 1) example of each: XOX XOX XOX OX OOX X O XXO XOO # a simplified game tree From R+N, b=3: 3 3 2 2 3 12 8 2 4 6 14 5 2 To simplify, assume we start with the whole tree. MIN chooses the value with minimum score; MAX chooses the value with maximum score. This is the *minimax* value. Optimal strategy is to choose the minimax value at each step. If you can compute the whole tree, you can compute the optimal strategy. Q1. Given a binary game tree with leaves: a) 1 0 3 15 8 13 5 11 | 10 7 4 9 6 2 14 12 b) 3 7 4 8 1 10 13 15 | 12 6 0 5 2 14 9 11 what is the minimax value? # evaluating minimax complete? yes (if tree is finite) optimal? yes (against optimal opponent) complexity bounds depend upon search. DFS is a reasonable choice since the whole tree needs to be explored. time? O(b^m) (feasible for simple games; infeasible for larger games; chess b ~ 35, m ~ 100) space? O(bm) How can we still play games? # AI Jeopardy, continued this method can eliminate large portions of the game tree from consideration, thus speeding up search. - alpha-beta pruning this expression returns an estimate of the expected utility of the game for a given position - evaluation function these game states occur multiple times in the game tree - transpositions # pruning return to original example 3 3 2 2 3 12 8 2 4 6 14 5 (of course, assume that the algorithm has to do the search and doesn't know the whole tree) upon expanding the first 2, can prune remaining nodes, since the current MIN value is less than the current MAX value Q2. Given a binary game tree with leaves: a) 1 0 3 15 8 13 5 11 | 10 7 4 9 6 2 14 12 b) 3 7 4 8 1 10 13 15 | 12 6 0 5 2 14 9 11 What is the minimax value? Use alpha-beta pruning, and do not expand nodes unnecessarily. # why is it called alpha-beta? alpha is the value of the best (highest-value) choice found so far at any choice point along the path for MAX If a value v for any subsequent node is worse than alpha, max will avoid it, so that branch can be pruned. beta is similar for MIN. # more, and improving alpha-beta pruning produces results identical to unpruned not just nodes: entire subtrees can be pruned order matters: - examining the "right" node first (max or min, depending) gives time O(b^(m/2)) - branching factor reduced to sqrt(b) - can't be done all the time, since a perfect ordering == an optimal strategy - alpha-beta pruning can search roughly twice as deep in a given time if repeated states are possible: - cache them in a hash table, often called a *transposition table* # still intractable? opening/closing books (trade space for time) stop the search at given depths evaluate nodes using a function that ideally: - orders states same way as true utility fn - efficient to calculate - correlated with actual probability of winning How? # eval fns calculate *features* — simple characteristics of the game state that are correlated with the probability of winning the evaluation function combine feature values to produce a score typically, evaluation functions are a weighted linear function eval(x) = w_1 * f_1(x) + ... = sum over i [w_i * f_i(x)] # example games and features Chess: - relative / absolute # of each type of piece - castled? - in check? - relative freedom (# of moves available) Checkers: - relative # pieces - relative # kings - relative freedom(# moves) # independence eval fns make a critical assumption: feature independence. Is this accurate? No. Does it matter? Depends. As long as the ordering of function's values is accurate (not necessarily the raw values), the results will be the same. # how to learn a fn? - human intuition - simulate many games, track results! (called a monte carlo simulation) # what about chance? what if there's a dice roll? answer: add another *ply* to the tree, a *chance* ply expectimax is minimax, but uses *expected value* for chance nodes O(b^m n^m) where n is the number of dice rolls # alpha-beta for expectimax? naively, no. If we bound each chance node (e.g., +/-2) then yes, but no longer optimal (boxcars!) another option: monte carlo, aka rollout # today's big ideas - using alpha-beta pruning to make searching game trees more tractable - using linear combinations of features to estimate value of non-terminal nodes - using expected value to handle chance elements