# Assignment 01

# Assignment 02

# why games?

simple to reason about

must consider an adversary's moves

R+N:
> Games, like the real world [...] require the ability to make some decision even when calculating the optimal decision is infeasible.

# Notable examples

Chess: Deep Blue (developed by Murray Campbell @IBM, defeated Kasparov)

Checkers: Chinook (dev Jonathon Schaeffer @UAlberta,  Marion Tinsley)

Checkers is solved! (draw)

# Today
search in adversarial environments

- key concepts:

    - game tree
    - min and max players
    - minimax value
    - negamax

- methods for searching game tree

    - alpha/beta pruning
    - approximate evaluation functions
    - games with chance elements

# AI Jeopardy

This data structure is defined by the initial game state and the legal moves for each player

  - game tree

This is the value of a node for a given player, assuming that both players play optimally to the end of the game

  - minimax value

This is a level of the search tree defined by a move by a single player

  - ply

# a game tree

tic-tac-toe example on the board, show:

- min vs max plies
- high b
- terminal state utility (for this game, -1, 0, 1)

example of each:

    XOX   XOX   XOX
     OX   OOX    X
     O    XXO   XOO
     
# a simplified game tree

From R+N, b=3:

3

3 2 2

3 12 8 2 4 6 14 5 2

To simplify, assume we start with the whole tree. MIN chooses the value with minimum score; MAX chooses the value with maximum score. This is the *minimax* value.

Optimal strategy is to choose the minimax value at each step. If you can compute the whole tree, you can compute the optimal strategy.

Q1. Given a binary game tree with leaves:

a) 1 0 3 15 8 13 5 11 | 10 7 4 9 6 2 14 12

b) 3 7 4 8 1 10 13 15 | 12 6 0 5 2 14 9 11

what is the minimax value?

# evaluating minimax

complete? yes (if tree is finite)
optimal? yes (against optimal opponent)

complexity bounds depend upon search. DFS is a reasonable choice since the whole  tree needs to be explored.

time? O(b^m) (feasible for simple games; infeasible for larger games; chess b ~ 35, m ~ 100)
space? O(bm)

How can we still play games?

# AI Jeopardy, continued

this method can eliminate large portions of the game tree from consideration, thus speeding up search.

- alpha-beta pruning

this expression returns an estimate of the expected utility of the game for a given position

- evaluation function

these game states occur multiple times in the game tree

- transpositions

# pruning

return to original example

3

3 2 2

3 12 8 2 4 6 14 5 

(of course, assume that the algorithm has to do the search and doesn't know the whole tree)

upon expanding the first 2, can prune remaining nodes, since the current MIN value is less than the current MAX value

Q2. Given a binary game tree with leaves:

a) 1 0 3 15 8 13 5 11 | 10 7 4 9 6 2 14 12

b) 3 7 4 8 1 10 13 15 | 12 6 0 5 2 14 9 11

What is the minimax value? Use alpha-beta pruning, and do not expand nodes unnecessarily.

# why is it called alpha-beta?

alpha is the value of the best (highest-value) choice found so far at any choice point along the path for MAX

If a value v for any subsequent node is worse than alpha, max will avoid it, so that branch can be pruned.

beta is similar for MIN.

# more, and improving alpha-beta

pruning produces results identical to unpruned

not just nodes: entire subtrees can be pruned

order matters:

  - examining the "right" node first (max or min, depending) gives time O(b^(m/2))
  - branching factor reduced to sqrt(b)
  - can't be done all the time, since a perfect ordering == an optimal strategy
  - alpha-beta pruning can search roughly twice as deep in a given time

if repeated states are possible:
  
  - cache them in a hash table, often called a *transposition table*

# still intractable?

opening/closing books (trade space for time)

stop the search at given depths

evaluate nodes using a function that ideally:

  - orders states same way as true utility fn
  - efficient to calculate
  - correlated with actual probability of winning
  
How?

# eval fns

calculate *features* — simple characteristics of the game state that are correlated with the probability of winning

the evaluation function combine feature values to produce a score

typically, evaluation functions are a weighted linear function

eval(x) = w_1 * f_1(x) + ... = sum over i [w_i *  f_i(x)]

# example games and features

Chess:

- relative / absolute # of each type of piece
- castled?
- in check?
- relative freedom (# of moves available)

Checkers:

- relative # pieces
- relative # kings
- relative freedom(# moves)

# independence

eval fns make a critical assumption: feature independence. Is this accurate?

No.

Does it matter? Depends. As long as the ordering of function's values is accurate (not necessarily the raw values), the results will be the same.

# how to learn a fn?

 - human intuition
 - simulate many games, track results! (called a monte carlo simulation) 

# what about chance?

what if there's a dice roll?

answer: add another *ply* to the tree, a *chance* ply

expectimax is minimax, but uses *expected value* for chance nodes

O(b^m n^m) where n is the number of dice rolls

# alpha-beta for expectimax?

naively, no.

If we bound each chance node (e.g., +/-2) then yes, but no longer optimal (boxcars!)

another option: monte carlo, aka rollout

# today's big ideas

- using alpha-beta pruning to make searching game trees more tractable
- using linear combinations of features to estimate value of non-terminal nodes
- using expected value to handle chance elements