Q1: 10 points Q2: 10 points Q3: 10 points Q4: 10 points Q5: 20 points Q6: 30 points Q7: 30 points Total: 120 points
Question text is in black, solutions in blue.
FALSE. By the Subset Construction, N' would be simulated by a DFA with at most
23 = 8 states. But some regular languages, such as
(a9)*, have no DFA with eight or fewer states and so
could not be equal to L(N').
Many people gave examples of languages with four-state NFA's, such as
{aaa}, and asserted that they had no three-state NFA's. Most of these arguments
were invalid because they relied on distinguishability of strings, which only
gives a lower bound on the size of DFA's. Here is a correct version of
the argument for the language {aaa}. Let N' be an NFA with L(N') = {aaa}.
There must be a path of three a-edges (plus possibly some ε-edges) from
the start state to some final state. Call the start state 0, the state
following the first a-edge 1, the state following the second a-edge 2, and the
final state 3. States 0, 1, and 2 must be non-final because otherwise
ε, a, or aa could be accepted and they are not in L(N'). No two states
among 0, 1, and 2 could be equal, because otherwise the machine could accept
a or aa. So N' has at least one final state and at least three non-final states
and thus has at least four states.
TRUE. Find a Chomsky Normal Form grammar G for L(M), which is possible because
L(M) is a context-free language. (Our theorem gets a CFG from ordinary PDA's,
but if M is a two-character PDA it can be converted to an ordinary PDA by
adding states.) Construct the top-down parser for G, as in the proof in Sipser
that every CFG has an equivalent PDA. This PDA has three states, pops or reads
at most one character per transition, and pushes at most two characters per
transition as the right-hand side of a rule in G has at most two characters.
We have no experience in showing that a PDA requires a certain number of
states, so the arguments purporting to do so were largely nonsense.
TRUE. In the in-class writing exercise we proved that a language has a
right-regular grammar if and only if it is regular. A language has a
left-regular grammar if and only if its reversal has a right-regular grammar,
because if G is left-regular or right-regular
you can get a grammar for L(G)R (defined to be
{w: wR ∈ L(G)}) by reversing the right-hand side of every rule
in G. And a language is regular if and only if its reversal is regular (just
reverse the regular expression for L to get a regular expression for
LR, and vice versa).
Many students answering this question didn't seem to realize that
left-regular and right-regular weren't the same thing. Others translated a
left-regular G into any NFA for L(G)R, but claimed to have an NFA
for L(G).
FALSE. L is generated by the grammar with rules S → XY, X → aXbb,
X → ε, Y → bYc, and Y → ε. Equivalently, a PDA
for L could push two a's on the stack for each a read, pop an a from the stack
for each b read until the stack is empty, push a b on the stack for each
remaining b read, pop a b for each c read, and accept if the stack is empty
at the end of the input.
Many people tried to use the CFL Pumping Lemma to show that L is not
context-free. But there's no reason string v in the CFL PL could not be "a",
and string y be "bb", so that pumping would maintain the relationship
j = 2k + i.
Of course building a valid DFA, as in part (b), is enough. I think the easiest
proof is to define A to be the set of eight-bit strings with an even number of
1's, and note that the set of ASCII strings is A* and so is regular
because A, a finite set, must be regular.
It is true that the set of strings whose length is divisible by 8 is regular,
and that the set of strings with an even number of 1's is regular, but the
set of ASCII strings is not the intersection of these two sets. It is contained
in this intersection, but 071107, for example, is in the
intersection but is not an ASCII string because it fails condition (2).
The minimal DFA has sixteen states, which we may call 0a, 1a, 2a,..., 7a, 0b,
1b,..., 7b. The start state is 0a, which is also the only final state. The
0-arrow from ia (for any number i) goes to (i+1)a, and the 1-arrow from ia goes
to (i+1)b -- in both cases the addition is done modulo 8. The 0-arrow from
ib goes to (i+1)b and the 1-arrow from ib goes to (i+1)a, except that both
arrows from 0b go to itself.
A string takes the DFA to ia if and only if its length is congruent to i
mod 8, the last i characters have an even number of 1's, and the string obtained
by deleting the last i characters is an ASCII string. (Thus it takes the DFA
to 0a if and only if it is itself an ASCII string.) A string takes the DFA
to ib, for positive i, if and only if its length is congruent to i modulo 8,
the last i characters have an odd number of ones, and the string obtained by
deleting the last i characters is an ASCII string. A string takes the DFA
to 0b if and only if it contains a "bad byte", an eight-bit substring with an
odd number of 1's that prevents the first 8k characters of the string, for
some k, from being an ASCII string. These conditions are preserved by every
new letter read, showing that this 16-state DFA is correct.
This DFA is minimal because the strings &epsilon, 0, 00,..., 07,
1, 01, 001,..., 071 are pairwise ASCII-distinguishable. First note
that each of these strings takes the DFA to a different state. The string
z = 08-i takes 0i to an ASCII string, but none of the
other 15 strings. The string 06-i1 takes 0i1 to an
ASCII string, but none of the other 15 strings. And no string at all takes
071 to an ASCII string. So for any two of these 16 strings, we can
find a z that distinguishes them.
L(P) is the set of strings with two b's and an even number of a's, where the b's come together and divide the a's into two equal blocks, that is, L(P) = {anbban: n ≥ 0}. This is because an accepting run of P must push a b, read and push some number of a's, read and push a b, read and pop a b, read an pop a number of a's equal to the number pushed, and pop a b.
From the English description, the grammar with one non-terminal S and two rules, S → aSa and S → bb, generates exactly L(P).
L(P) is not regular. Myhill-Nerode proof: The infinite set of strings {anbb: n ≥ 0} are pairwise distinguishable for L(P) -- if i ≠ j, take z to be ai and z distinguishes aibb from ajbb. Pumping Lemma proof: Let p be the pumping length and let w be the string apbbap. Because |xy| ≤ p and |y| > 0, y must be a non-empty string of a's and pumping y up or down yields a string with more or fewer a's on the left and thus not in L(P).
This normal form had three conditions: (1) the PDA has exactly one final state,
(2) it can accept only with an empty stack, and (3) every transition either
pushes or pops one character, but not both. P obviously satifies (1) and (3).
For (2) note that the first transition must push a b onto the stack, and this
b can only be popped by the only transition into state f. So we can enter f
only by popping this b and thus leaving an empty stack.
Many of you mixed up this normal form for PDA's, which was defined only for
this particular proof, with other normal forms for grammars or NFA's.
The construction created a non-terminal Axy for every pair of elements x and y in the state set {p,q,r,s,f}, with x = y possible. There are thus 25 non-terminals in the grammar, though as we will see most of them are useless.
Any of the non-terminals App, Aqq, Arr,
Ass, or Aff can derive the empty string ε, which
is a string of (zero) terminals. Each of the others, say Axy,
can derive a string only
if P can read that string while going from state x and empty stack to state y
and empty stack. From p and empty stack, P cannot empty the stack until it is
in f, and from q and empty stack it cannot empty the stack until it is in S.
So Apf and Aqs are the only other useful non-terminals.
Although you weren't asked for it, the grammar constructed (ignoring the
useless non-terminals) has rules Apf → Aqs,
Aqs → aAqsa, Aqs → bArrb,
and Arr → ε. There are also all the rules of the form
Axy → AxzAzy, but they are either trivial
or involve useless non-terminals in this case.
L(D) is the set of all strings containing at least one b and having at most two a's after the last b. Note that after the first b, every b takes D to state 2, whereupon states 2, 3, 4, and 5 represent 0, 1, 2, and more than 2 a's since the last b respectively.
From the English description, which I have justified above, the regular expression Σ*b(ε ∪ a ∪ aa) is correct. For the standard construction, add new start state 0 and final state 6. Killing 1 gives the transition (0, a*b, 2). Killing 5 gives (4, aa*b, 2), making (4, b ∪ aa*b, 2). Killing 4 gives (3, a, 6) which merges into (3, ε ∪ a, 6) and (3, a(b ∪ aa*b), 2) which merges into (3, b ∪ a(b ∪ aa*b), 2). Now killing 3 gives (2, a(ε ∪ a), 6) which merges into (2, ε ∪ a ∪ aa) and (2, a(b ∪ a(b ∪ aa*b)), 2). Finally killing 2 gives (0, a*b(a(b ∪ a(b ∪ aa*b))*(ε ∪ a ∪ aa), 6), from which we can read the final regular expression. This simplifies to a*b(a*b)*(ε ∪ a ∪ aa).
No valid DFA for L(D) could have a death state, because any string followed by a b is in L(D), and thus any reachable state of any valid DFA for L(D) must have a b-arrow to a final state and could thus not be a death state.
If d1 and d2 were each death states, the minimization algorithm will never separate them and thus they will be merged together at the end of the minimization algorithm, making the DFA smaller and showing that the original DFA was not minimal. More directly, you can just merge d1 and d2 into a single non-final state d, keeping the language exactly the same as no possible path to any final state has changed.
There is a death state if and only if there exists a string x such that for any string y, xy is not in the language. Clearly any string that takes the DFA to a death state has this property. And if such a string x exists, it is Myhill-Nerode equivalent to xz for every possible z, since xz has the same property. So in the minimal DFA for the language, the state reached by x is a death state.
Last modified 5 March 2009