Question text is in black, solutions in blue.
Q1: 10 points Q2: 10 points Q3: 10 points Q4: 10 points Q5: 10 points Q6: 10 points Q7: 10 points Q8: 10 points Q9: 10 points Q10: +10 points Q11: 10 points Q12: 10 points Q13: 10 points Total: 120+10 points
The regular language L1 is over the alphabet Σ = {a, b, c} and is defined by the regular expression Σ*(aa ∪ bb ∪ cc)Σ*.
The language L2 is the complement of L1, that is, Σ* ∖ L1.
The context-free language L3 over the alphabet {a, b} is defined by the grammar with start symbol S and rules S → bSS and S → a.
TRUE. Let M be a PDA as in the proof of the PDA-to-CFG Theorem. M
has exactly one final state (not its start state), accepts only with
an empty stack, and either pushes or pops one letter (not both) one
each transition. Let M' be a PDA with the same state set, switched
start and final states, and each transition reversed: (p, a, c;
q, ε) becomes (q, a, ε;p, c) and (p, a, ε;
q, c) becomes (q, a, c; p, ε). Each path in M from
q0 to f, accepting with empty stack, corresponds to a
path in M' from f to q0 that also accepts with empty
stack. The string that M' reads is exactly the reversal of the
string that M reads.
Another proof is to take a grammar for L and reverse the
right-hand sides of every rule. I didn't think that it was obvious
that this construction is correct, so I took a point off for
assuming that it is. Someone gave a convincing proof that it is --
consider a parse tree showing that w is in L(G), and hold that tree
up to a mirror. The result is a parse tree for wR, and
the rules it uses are exactly those built by this construction.
FALSE. Use the CFL Pumping Lemma -- let p be the pumping length, let w = apbpcp, and let w = uvxyz. Because |vxy| ≤ p, the strings v and y together include only one or two types of letters, not all three. Because vy ≠ ε, v and y together do include at least one type of letter. Let i = 117p and look at the string uvixyiz. There are still only p of the letter (or letters) that was not affected, and at least 118p of one of the affected letter types, so p is less than 1/117 of the total. Since the language cannot satisfy the CFLPL, it is not context-free.
FALSE: Let u = aa, v = bb. Given any string z, both uz and vz are in the L1, so u and v are L1-equivalent. But they have different last letters. If neither u nor v is in L1, the statement is true.
FALSE: There are exactly five such strings: bababaa, babbaaa, bbaabaa, bbabaaa, and bbbaaaa, as is easy to check by exhaustive search of derivations. This language is similar to the balanced parenthesis language Dyck1, for which the number of strings of length 2k is the Catalan number Ck.
TRUE: It is the language of the grammar S → uTz, T → vTy, T → x. Any derivation in this grammar must use the first rule, then the second rule i times, then the last rule, generating uvixyiz.
FALSE: My four-state NFA example has nonfinal start state 1, with a-transitions to final states 2, 3, and 4, and no other transitions. The equivalent DFA has three states, {1}, {2,3,4}, and ∅.
Questions 7-10 use the languages L1 and L2 defined above.
The simplest NFA has five states -- nonfinal start state 1 with a
Σ loop, three nonfinal intermediate states 2, 3, and 4 each with a
letter-arrow in from the start and an arrow with the same letter to
the final state, and a final state 5 with a Σ-loop. This is
correct because an accepting path may read any string staying at the
start, then read aa, bb, or cc, then read any string staying at the
final state.
My construction, followed slavishly, would give 11 states because
the start and final states above would each be replaced by four-state
machines for Σ*. The book's construction has 22
because of all the extra ε-moves.
You could also build and justify the DFA as an answer to this
question, since DFA's are also NFA's.
A DFA for L1
is pretty easy to build and generate directly, but let's carry
out the Subset Construction. Start state {1} has an a-arrow to {1,
2}, a b-arrow to {1, 3}, and a c-arrow to {1, 4}. State {1, 2} has an
a-arrow to {1, 2, 5}, a b-arrow to {1, 3}, and a c-arrow to {1, 4}.
States {1, 3} has an a-arrow to {1, 2}, a b-arrow to {1, 3, 5}, and a
c-arrow to {1, 4}. State {1, 4} has an a-arrow to {1, 2}, a b-arrow
to {1, 3}, and a c-arrow to {1, 4, 5}. The three final states each
have a-arrows to {1, 2, 5}, b-arrows to {1, 3, 5}, and c-arrows to {1,
4, 5}. This DFA has seven states -- one start, three intermediate,
and three final. It remembers whether it has seen a double letter
yet, and what the last letter (if any) was.
Of course we get a DFA for L2 by switching the final and
nonfinal states of this DFA.
We divide the DFA from Question 8 (for L2)
into classes F and N. All three
states in N go to N on each letter, so there is no reason to separate
them.
The four states in F each have a different behavior on inputs (a, b,
c) -- {1} has (F, F, F), {1, 2} has (N, F, F), {1, 3} has (F, N, F),
and {1, 4} has (F, F, N). So F must be split into four classes, and
we have a final DFA with five states (with the three nonfinal states
merged to one). Since we used the minimization algorithm, this DFA is
minimal.
It's also straightforward to give this five-state DFA and show that
each pair of final states is distinguishable.
The clever method first -- the strings in L2 with no a's
are ε ∪ b(cb)*(c ∪ ε) ∪
c(bc)*(b ∪ ε). This is because a string of b's
and c's with no double letter must alternate b's and c's.
Define X to be the regular expression b(cb)*(c ∪
ε) ∪ c(bc)*(b ∪ ε). These are the
nonempty strings in L2 that have no a's. Then
since every two a's must have a string in L(X) between them, we get
(ε ∪ X)(aX)*(a ∪ ε). Substituting
for X gives the whole regular expression.
I worked out an answer by state elimination as well and got
ε ∪ c ∪ Y ∪ (a ∪ ca ∪ Y)(ca ∪
Y)*(ε ∪ c ∪ Y), where Y is the regular
expression (b ∪ ab)(cb)*(a ∪ ca).
Questions 11-13 deal with the context-free language L3 defined above, with the grammar rules S → bSS and S → a.
Using the top-down parser, there are three states q0, p, and f. There is a transition from q0 to p that reads nothing and pushes S$ so that the S is the bottom of the stack. (I didn't insist that you break this into two transitions, each pushing a letter.) There are four loops on state p, one reading and popping an a, one reading and popping a b, one popping S and pushing a, and one popping S and pushing bSS. (Officially this last one should be broken into three transitions each pushing a letter.) Then there is one transition from p to f that reads nothing and pops $. Since this PDA is made by a known construction from the CFG, we know that it is correct for the language of the CFG.
L3 is NOT a regular language. Its intersection with
b*a* is the language
{biai+1: i ≥ 0}, but we have to justify this
claim.
(We can generate any string of this type by using the S → bSS
rule i times and using the S → a rule on the remaining S's. To
be in this regular language, a string derived in the grammar must use
the S → a rule to the right of any a's, so the only decision is
how many times to use the other rule to the left of any a's, and if
this number is i then we generate exactly the string
biai+1.)
This language is easily seen to be non-regular by either the
Myhill-Nerode method or the Regular Language Pumping Lemma. For the
former, note that {bi: i ≥ 0} is an infinite set of
distinguishable strings because bi and bi are
distinguished by ai+1.
By induction on k, the number of steps in the derivation -- the base case
is k = 1 and the only complete derivation of one step is for the
string "a", which has odd length.
For the inductive step, assume that every derivation of k or fewer
steps produces a string of odd length, and consider any derivation of
length k+1. The first step must take S to bSS, assuming k > 0,
and the rest of the derivation produces strings u and v from the two
S's.
Since these derivations use k or fewer steps, they produce odd-length
strings by the IH. So the string buv that we derive has length (1 +
|u| + |v|) which is the sum of three odd numbers and therefore is
odd.
I gave full credit for a coherent invariant argument, arguing that
every string of a's, b's, and S's in a derivation has odd length
because the rules either keep the length the same or increase it by
2, and the original length of "S" is 1.
Last modified 26 February 2012