# CMPSCI 250 Discussion #8: Arithmetic Expressions

#### 7 November 2007

In this discussion we looked at recursive definition for arithmetic expressions, compared top-down and bottom-up definitions, and tried to write an expression evaluator.

In Section T.2 we had the following definition of arithmetic expressions, terms, and factors:

• An expression is either a term, or a term followed by "+" and an expression.
• A term is either a factor, or a factor followed by "×" and a term.
• A factor is either an atom, or an expression enclosed in parentheses.

This is a top-down recursive definition, unlike our recursive definitions for naturals or strings. The top-down definition translates much more easily to a recursive algorithm to determine whether a string is an expression, or to compute some recursively defined property of the expression like:

• The value of an atom is given by the method `evalAtom`.
• The value of a factor is the value of its atom or its expression.
• The value of a term is the value of its factor, or the value of its first factor times the value of the remaining term.
• The value of an expression is the value of its term, or the value of its first time plus the value of its remaining expression.

You're asked to build an expression evaluator using the following methods:

• `getToken()` removes and returns the next `Token` from the input. A `Token` is an atom or one of the characters in the set {+, ×, (, )}. If there is no next token or the next token is invalid, `getToken` throws an exception, which you need not handle.
• `eof()` returns `true` if and only if there are no tokens remaining in the input.
• `peek()` returns the `Token` object that is next in the input, without removing it from the input. It throws an exception if there is no such token. This method was not given to you in the actual discussion, and it should have been because you need it or something similar to solve the problem.
• `isAtom`, `isPlus`, `isTimes`, `isLparen`, and `isRparen` are methods in the `Token` class that return `true` if the `Token` is an atom or is the given character.
• `evalAtom` is a method in the `Token` class that returns a `double` giving that token's value if it is an atom. It throws an exception if the token is a character rather than an atom -- your code should prevent this from happening.

### Writing Exercises:

1. Write a bottom-up definition for factors, terms, and expressions equivalent to the top-down definition given above.

• An atom is a factor.
• An expression in parentheses is a factor.
• A factor is a term.
• A factor, followed by "×" and a term, is a term.
• A term is an expression.
• A term, followed by "+" and an expression, is an expression.
2. Write a pseudo-Java method `evalExpression` that returns the `double` value of an expression as defined above, and throws an exception (probably from `getToken`) if the expression is not valid. The input comes from whatever source `getToken` is getting it from, and you should assume that the expression ends at the end of the input (in the text given in discussion I said you could stop at the first complete expression, but this is too easy if the expression starts with an atom). You will want to define methods `evalTerm` and `evalFactor`. Of course, without the peek method you didn't have the tools to solve this problem, but I hope the experience of working at it and the solution will be illuminating.

``````
double evalExpression()
{// evaluates and removes an expression from the input
double temp = evalTerm();
if (!eof() && peek().isPlus()) {
Token discard = getToken();
return temp + evalExpression();}
else return temp;}

double evalTerm()
{// evaluates and removes the next term from the input
double temp = evalFactor();
if (!eof() && peek().isTimes()) {
Token discard = getToken();
return temp + evalTerm();}
else return temp;}

double evalFactor()
{// evaluates and removes the next factor from the input
Token next = getToken();
if (next.isAtom()) return next.evalAtom();
if (next.isLparen()) {
double temp = evalExpression();
if (getToken().isRparen()) return temp;
else throw new Exception("invalid expression");}
throw new Exception ("invalid expression");}
``````
3. Argue by induction on call trees that your evaluator program returns the correct value as defined.

Assuming that the method returns a value, it is easy to show that the value is correct by induction on the call tree. The call tree has a root node for the initial call to `evalExpression`, other nodes for every call to one of the other methods, and leaves for every call to `evalAtom`. Define P(v) to mean "the call at node v of the tree returns the correct value for the expression, term, factor, or atom read during that call". If v is a leaf, we know that `evalAtom` returns the correct value of the atom by definition. Otherwise we have three cases depending on which method is called at v:

• If v is a call to `evalFactor`, then there are two subcases. If the factor is an atom, we know by the IH that `evalAtom` returns the correct value and thus that the value we return, by the definition of value, is correct. If the factor is an expression in parentheses, the IH says that the following expression is correctly read and evaluated by ``` evalExpression```, and we can see that we read the two parentheses and return the value of the expression as we should. These are the only two ways that `evalFactor` can return a value.
• If v is a call to `evalTerm`, there are again two cases. If the term is a single factor, the IH says that we read its value, and that is the value of the term. If there is another term following a times sign, we read it, evaluate both the first factor and the following term correctly by the IH, and return the product of those two values as we should.
• Similarly, if v is a call to `evalExpression`, we either read and return the value of a single term, as we should, or read and evaluate the first term, read and discard the plus sign, read and evaluate the following expression, and return the sum of the two values as we should.

This shows partial correctness of the methods -- if they terminate they give the right answer. We'd also like to prove termination on all valid input. Here we prove by induction on all expressions, terms, factors, and atoms that each of these things is read and evaluated by its appropriate method:

• If the expression is an atom, it is read by ```evalExpression ``` by calls to `evalTerm` and `EvalFactor`.
• If the whole expression is a factor, it is either an atom or an expression in parentheses, and in either case we trace the code to see that it is evaluated correctly, given the IH that the atom or expression is read and evaluated.
• If the whole expression is a term, it is either a factor or a factor times a term, and in each case we trace the code as above.
• Finally, if the expression is a term we read and evaluate it by the IH, and if it is a term plus an expression we trace the code and use the IH to show that it is read and evaluated.

There remains the question of what this code does if given input that is not a single expression followed by the `eof` condition. In fact it reads the largest valid expression it can find. The `evalExpression` and `evalTerm` methods go on past their first term or factor if and only if they see the plus sign or times sign they need. If there were two atoms in a row in the input, for example, the methods would never evaluate the second one -- they would only peek at it to see that it was not a plus or times sign, and otherwise essentially treat it as the end of the input.

Last modified 9 November 2007