Minimax and Alpha-beta Algorithms
CSW4701 Artificial Intelligence
We are concerned with two player, perfect knowledge games, where one of the
players is the computer. The following two algorithms are called MINIMAX and
ALPHA-BETA. In order to use these algorithms, one most design and implement the
state descriptions for the game board, the heuristic static evaluation
functions that rate the boards, the move generators that implement the rules of
the game, and a predicate that determines if a board is a winning position.
We present the algorithms in a mixed style incorporating features of LISP and
typical block structured languages like C to provide a flavor of how one might
implement them in a variety of languages.
The MINIMAX procedure is embodied by the function BESTMOVE. BESTMOVE is called
with the current board position as its first argument, and a depth bound. The
algorithm will expand the search space (an AND/OR tree or graph) to the depth
specified by the second argument, and return the best move for the computer to
make given the current board position as its first argument. The move
generators and heuristic evaluation function are considered global and are
directly called by the algorithm. The heuristic static evaluation function is
named EVALUATE. EVALUATE provides high positive numbers when board positions
that are GOOD for the computer
(+ infinity means it wins),
while conversely low negative numbers are bad for the computer
(- infinity is a loss).
The move generator, called POSSIBLE-MOVES, returns a list of possible moves
according to the rules of the game.
Another function, called NEWPOSITION, generates a new board position given a
legal move.
(Certainly, you can modify this pseudo-code to pass these functions as
functional arguments.)
Question: Where are the "AND and OR nodes"? Why does it appear that we have
ignored all that? In fact, we haven't. Can you see how AND(min) and OR(max)
nodes are implicitly defined through the recursion?
Remember: - n = -n, - -n = +n
BTW: Do you have a two agent game on your computer (something like Checkers or
Chess on your Windows95 PC?). Well then, this code is already on your
computer!
BESTMOVE = proc (posn, depth)
begin (movelist, bestscore, bestm, try, tryscore) %local variables
% posn: is the current BOARD CONFIGURATION FROM WHICH A MOVE
% MUST BE CHOSEN BY OUR INTELLIGENT AGENT COMPUTER
% depth: MAXIMUM NUMBER OF PLIES TO LOOK AHEAD. THIS IS DETERMINED
% BY SPEED CONSTRAINTS AND COMPLEXITY OF THE GAME
% (i.e. branching factor b)
% movelist: the list of all possible MOVES from the current posn
% bestscore: the best "backed up" score found so far as we iterate
% thru the movelist
% bestm: the best move found so far that results in the bestscore
% try: just a tmp to hold returned values from recursive calls
% tryscore: just a tmp to hold returned scores from recursive calls
if depth = 0 then return list(EVALUATE(posn), nil) fi;
%This ends the recursion at the bottom of the search space
%Note a two element list of values is returned.
Movelist = POSSIBLE-MOVES(posn);
%According to the rules of the game, we generate all possible
%moves from the given board position.
Bestscore = -infinity; %initializations
Bestm = nil;
%We are now ready to scan the movelist and select the
%best move. Note how we initialized Bestscore and Bestm.
while ( Movelist <> nil )
repeat,
try = BESTMOVE( NEWPOSITION(posn, first(Movelist)), depth-1);
%Here is the main recursive call that expands and searches
%the state space from the selected move.
tryscore = - first(try); %recall BESTMOVE returns two values
%Now we determine how well this current move did and whether
%it should be selected as our best move found so far.
if tryscore > Bestscore then
do,
Bestscore = tryscore;
Bestm = first(Movelist);
od;
%Now we continue scanning down the list of moves to see
%if we can find a better move then found so far.
Movelist = rest(Movelist);
taeper;
%After scanning the entire Movelist, we have our best move so:
return( list(Bestscore, Bestm) );
nigeb; %End of procedure min-max.
Now we provide the alpha-beta variant of mini-max. READ CAREFULLY.
BESTMOVE = proc (Posn, Depth, Mybest, Herbest)
%Note we added two additional parameters. Mybest should be
%initialized to -infinity, while herbest is initialized to
%+infinity when calling this version of BESTMOVE.
begin (Movelist, Bestscore, Bestm, Try, Tryscore) %local variables
if Depth = 0 then return list(EVALUATE(Posn), nil) fi;
%This ends the recursion at the bottom of the search space
%Note a two element list of values is returned.
Movelist = POSSIBLE-MOVES(Posn);
%According to the rules of the game, we generate all possible
%moves from the given board position.
Bestscore = Mybest % note the change to initializations here
Bestm = nil;
%We are now ready to scan the movelist and select the
%best move. Note how we initialized Bestscore and Bestm.
while ( Movelist <> nil )
repeat,
Try =
BESTMOVE( NEWPOSITION(posn, first(Movelist)),
Depth-1,
-Herbest,
-Bestscore );
%Here is the main recursive call that expands and searches
%the state space from the selected move. But notice we
%added the two additional parameters here. This makes
%sense when we view the following conditional expressions.
Tryscore = - first(Try); %recall BESTMOVE returns two values
%Now we determine how well this current move did and whether
%it should be selected as our best move found so far.
if Tryscore > Bestscore then
do,
Bestscore = Tryscore;
Bestm = first(Movelist);
od;
%Ok, here is where the cutoffs occur in alpha-beta. We
%terminate the search if we can't possibly find a good move.
%Recall our discussion in class.
%The "return" function breaks the loop, terminates this
%execution and returns to the previous recursive level.
if Bestscore > Herbest then return ( list(Bestscore, Bestm));
%Now we continue scanning down the list of moves to see
%if we can find a better move then found so far.
Movelist = rest(Movelist);
taeper;
%After scanning the entire Movelist, we have our best move so:
return( list(Bestscore, Bestm) );
nigeb; %End of procedure alpha-beta.
One last important item to think about. The depth bounds specified in the
algorithms have to be chosen carefully NOT ONLY SO THAT WE DON'T RUN OUT OF
TIME IN RESPONDING WITH A MOVE, BUT ALSO THAT WHEN WE BOTTOM OUT AT THE
FRONTIER, THE CORRECT SENSE OF THE EVALUATION FUNCTION IS USED AT THAT LEVEL.
Thus, we have to make sure we descend down to an EVEN number of levels counting
the starting position as level 1. Since depth is decremented to 0, this
implies the depth bound should be set to an ODD number. Alternatively, you can
set it to an EVEN number if you check that depth = 1, rather than 0. This
minor little point can reak havoc with the algorithm if you are not absolutely
careful.