CMPSCI 601: Theory of Computation

David Mix Barrington

Fall, 2004

Homework Assignment #1

Solutions posted Thu 4 Nov 2004

Covers Lectures 1-5

Question text in black, solutions in blue.

These problems deal with four formal languages over the alphabet {0,1}. Define the following function f from {0,1}^* to the integers: f(λ) = 0 and for any string w, f(w0) = f(w) - 1 and f(w1) = f(w) + 1. (Recall that λ is the empty string.)

A is defined to be the language {w: f(w) = 0}.

B is the language {w: f(w) = 0 and for all v, if v is a prefix of w then f(v) ≥ 0}.

C is the language {w: w is in B and for all v, if v is a prefix of w then f(v) ≤ 3}.

Finally, D is the language {w: w is in B and for all v, if v is a prefix of w then f(v) ≤ 1}. (This said "f(w) ≤ 1" before, which makes D the same language as B.)

(Recall that string u is a prefix of string v if there is a string x such that ux = v.)

Question 1: Which of these four languages is regular? For each of the regular ones, describe a DFA deciding it and a regular expression denoting it. For each of the non-regular ones, prove that it is not regular. (You may find the result of question 2 the easiest way to do this.)
Languages A and B are not regular, as will be shown in Question 2. Languages C and D are regular. C has a five-state DFA, with state set {0,1,2,3,4}, start state 0, and final state set {0}. Its 0-transitions are from 0 to d, 1 to 0, 2 to 1, 3 to 2, and d to d. Its 1-transitions are from 0 to 1, 1 to 2, 2 to 3, 3 to d, and d to d. (The "death state" d is needed because every state must have a 0-transition and a 1-transition.) A regular expression for C is (1(1(10)^*0)^*0)^* as is easy to compute from the DFA by the state reduction method.
D has a three-state DFA with state set {0,1,d}, start and only final state 0, 0-transitions from 0 to d, 1 to 0, and d to d, and 1-transitions from 0 to 1, 1 to d, and d to d. D's regular expression is (10)^*.
Question 2: For each of the four languages, describe its Myhill-Nerode equivalence classes. (Of course there are infinitely many of these if the language is not regular.)
A has an equivalence class for every integer, positive, negative or zero. This is because membership in A depends only on the value of f, and f is easily proved to be a homomorphism (that is, f(uv) = f(u) + f(v) for any strings u and v). So if f(u) = f(v), then f(uw) = f(vw) for any w, and thus uw and vw are either both in or both out of A. Conversely, if f(u) ≠ f(v), we can find a string w with f(w) = -f(u), and then uw will be in A and vw will not be in A. Thus u and v are not A-equivalent.
B has a class for each non-negative integer k, consisting of those strings w such that f(w) = k and no prefix u of w has f(u) < 0. All strings u that have a prefix with negative f-value are B-equivalent, because for each of them uv is not in B for any string B. To prove that the set of strings with no f-negative prefix is divided into classes by f-value, note that if u and v are such strings f(u) = f(v), again f(uw) = f(vw) for any w and thus uv and uw are both in B or both not in B. If, on the other hand, u and v are such strings with f(u) ≠ f(v), we may again choose a string w such that f(w) = -f(u) (for example, w could be 0^f(u)) and see that f(uw) = 0 and f(vw) ≠ 0, so that uw is in B and vs is not.
C and D have classes corresponding to the states of the DFA's I gave in the solution to Question 1. The classes 0, 1, 2, and 3 of C correspond to the four possible values of f(w) for strings w that have no prefix u with either f(u) < 0 or f(u) > 3. All strings that have such a prefix are equivalent to each other, since none of them can have uw ∈ C for any string w. The value of f determines the class by an argument similar to that given for Languages A and B.
Language D has three Myhill-Nerode classes, similar to those of C and matching the DFA given above. Justification of this claim is very similar to that for C.
Question 3: For each of the non-regular languages, determine whether or not it is context-free. If it is, give both a PDA and a grammar for it. If it is not, prove that it is not.
Both A and B are context-free. A grammar for A is S → SS, S → 1S0, S → 0S1, S → ε. A grammar for B is S → SS, S → 1S0, s → ε. The simplest PDA's for A and B keep track of the current value of f(w) in unary on the stack. For language B, the PDA after reading w has f(w) 1's on its stack (on top of a bottom-of-stack marker) unless a prefix has had negative f-value (in which case the PDA is in a death state). This condition is easy to maintain by pushing a 1 when you see a 1 and popping a 1 when you see a 0. The PDA accepts if it can remove the bottom of stack marker at the end of the string.
The PDA for A is similar except that it keeps k 1's on the stack if f(w) is a positive number k, and keeps k 0's on the stack if f(w) = -k. Both these conditions are easy to maintain, and we have a bottom-of-stack marker as well. The PDA accepts if it can remove the marker at the end of the string.

Last modified 4 November 2004