CMPSCI 250 Discussion #8: Arithmetic Expressions

David Mix Barrington

7 November 2007

In this discussion we looked at recursive definition for arithmetic expressions, compared top-down and bottom-up definitions, and tried to write an expression evaluator.

In Section T.2 we had the following definition of arithmetic expressions, terms, and factors:

An expression is either a term, or a term followed by "+" and an expression.
A term is either a factor, or a factor followed by "×" and a term.
A factor is either an atom, or an expression enclosed in parentheses.

This is a top-down recursive definition, unlike our recursive definitions for naturals or strings. The top-down definition translates much more easily to a recursive algorithm to determine whether a string is an expression, or to compute some recursively defined property of the expression like:

The value of an atom is given by the method evalAtom.
The value of a factor is the value of its atom or its expression.
The value of a term is the value of its factor, or the value of its first factor times the value of the remaining term.
The value of an expression is the value of its term, or the value of its first time plus the value of its remaining expression.

You're asked to build an expression evaluator using the following methods:

getToken() removes and returns the next Token from the input. A Token is an atom or one of the characters in the set {+, ×, (, )}. If there is no next token or the next token is invalid, getToken throws an exception, which you need not handle.
eof() returns true if and only if there are no tokens remaining in the input.
peek() returns the Token object that is next in the input, without removing it from the input. It throws an exception if there is no such token. This method was not given to you in the actual discussion, and it should have been because you need it or something similar to solve the problem.
isAtom, isPlus, isTimes, isLparen, and isRparen are methods in the Token class that return true if the Token is an atom or is the given character.
evalAtom is a method in the Token class that returns a double giving that token's value if it is an atom. It throws an exception if the token is a character rather than an atom -- your code should prevent this from happening.

Writing Exercises:

Write a bottom-up definition for factors, terms, and expressions equivalent to the top-down definition given above.

Write a pseudo-Java method evalExpression that returns the double value of an expression as defined above, and throws an exception (probably from getToken) if the expression is not valid. The input comes from whatever source getToken is getting it from, and you should assume that the expression ends at the end of the input (in the text given in discussion I said you could stop at the first complete expression, but this is too easy if the expression starts with an atom). You will want to define methods evalTerm and evalFactor. Of course, without the peek method you didn't have the tools to solve this problem, but I hope the experience of working at it and the solution will be illuminating.


    double evalExpression()
    {// evaluates and removes an expression from the input
       double temp = evalTerm();
       if (!eof() && peek().isPlus()) {
          Token discard = getToken();
          return temp + evalExpression();}
       else return temp;}

    double evalTerm()
    {// evaluates and removes the next term from the input
       double temp = evalFactor(); 
       if (!eof() && peek().isTimes()) {
          Token discard = getToken();
          return temp + evalTerm();}
       else return temp;}

    double evalFactor()
    {// evaluates and removes the next factor from the input
       Token next = getToken();
       if (next.isAtom()) return next.evalAtom();
       if (next.isLparen()) {
          double temp = evalExpression();
          if (getToken().isRparen()) return temp;
          else throw new Exception("invalid expression");}
       throw new Exception ("invalid expression");}

Argue by induction on call trees that your evaluator program returns the correct value as defined.
Assuming that the method returns a value, it is easy to show that the value is correct by induction on the call tree. The call tree has a root node for the initial call to evalExpression, other nodes for every call to one of the other methods, and leaves for every call to evalAtom. Define P(v) to mean "the call at node v of the tree returns the correct value for the expression, term, factor, or atom read during that call". If v is a leaf, we know that evalAtom returns the correct value of the atom by definition. Otherwise we have three cases depending on which method is called at v:
This shows partial correctness of the methods -- if they terminate they give the right answer. We'd also like to prove termination on all valid input. Here we prove by induction on all expressions, terms, factors, and atoms that each of these things is read and evaluated by its appropriate method:
- If the expression is an atom, it is read by evalExpression by calls to evalTerm and EvalFactor.
- If the whole expression is a factor, it is either an atom or an expression in parentheses, and in either case we trace the code to see that it is evaluated correctly, given the IH that the atom or expression is read and evaluated.
- If the whole expression is a term, it is either a factor or a factor times a term, and in each case we trace the code as above.
- Finally, if the expression is a term we read and evaluate it by the IH, and if it is a term plus an expression we trace the code and use the IH to show that it is read and evaluated.
There remains the question of what this code does if given input that is not a single expression followed by the eof condition. In fact it reads the largest valid expression it can find. The evalExpression and evalTerm methods go on past their first term or factor if and only if they see the plus sign or times sign they need. If there were two atoms in a row in the input, for example, the methods would never evaluate the second one -- they would only peek at it to see that it was not a plus or times sign, and otherwise essentially treat it as the end of the input.

Last modified 9 November 2007