Question text is in black, solutions in blue.
Q1: 10 points Q2: 10 points Q3: 10 points Q4: 10 points Q5: 30 points Q6: 50 points Total: 120 points
TRUE. We can simulate this PDA with an NFA. There are only a finite number of possible stack configurations: if the size of the stack alphabet Γ is k, there are k3 + k2 + k + 1. So our NFA can have a state for each pair consisting of a state of M and a stack configuration of M. Since L(M) is the language of an NFA, it is regular.
FALSE: Let X be the language {anbn: n ≥ 0}, which we know to be a CFL that is not regular. The complement of X is a CFL, though we must justify this claim. X-bar is the union of three languages: (1) the complement of the regular language a*b*, which is a CFL because it is regular, (2) {aibj: i > j}, which is a CFL -- we can design a grammar with rules S → aS, S → aSb, and S → a, for example, and (3) {aibj: i < j}, which is also a CFL by a very similar argument to (2). Since X-bar is the union of three CFL's, it is a CFL, and X satisfies the conditions of the statement but is not regular.
TRUE. Apply the Subset Construction to N. The start state will be
{q0}, where q0 is the start state of N. Each additional
DFA state generated in the subset construction can have no more than
one element, since N has no multiple choices or ε-moves. We might or
might not generate the "death state" for the empty set, and every other DFA
state must be a singleton set ({s}, where s is a state of N}, so there are only
k+1 possible states. If the construction gives us fewer than k+1 states, we
may add unreachable states to get exactly k+1.
Another, perhaps better way to put this: Define D to have a state for each
state of N, plus one more state called d, the "death state". D's transition
function follows the arrows of N wherever possible, and otherwise goes to d.
Of course d's moves are all to itself. It is obvious that L(D) = L(M), and D
has exactly k+1 states.
TRUE. The argument I like best is to take a regular expression for Y (which
must exist because Y is regular) and replace all 0's in it with ε's
(technically, with ∅*'s). It's obvious to me that this new
regular expression generates Z.
If we start with a DFA or NFA for Y, we can create an NFA for Z by replacing any
0-moves in the Y-machine with ε-moves. I don't think it's obvious that
this gives a machine for Z, but it's easy to justify this claim. If a string
w is accepted by the alleged Z-machine, there is a path of 1-moves and
ε-moves in the Z-machine from the start state to the final state. The
corresponding path of 1-moves and 0-moves in the Y-machine accepts some string
u in Y, and clearly f(u) = w, so w is in Z. Conversely, if u is any string in
Y, we can take its accepting path in the Y-machine and map it directly to a path
in the new machine that accepts f(u), so f(u) is accepted.
Many people incorrectly claimed that Z must be regular because it must be
the language 1*. Z must be contained in 1*
because every letter in every string of Z is a 1, but the exact nature of Z
depends on Y -- there is no reason that every string in 1* is the
image under f of some string in Y. For example, let Y be (110)* --
then Z is (11)*, not 1*.
We carry out the standard state-elimination construction. We first form a
GNFA by adding a new start state 0 and a new final state 4 to states 1, 2, and
3, adding ε-moves from 0 to 1 and from 2 to 4.
We next eliminate state 3, which creates four new moves because 3 has two
moves coming into it and two going out of it. We add a bc-loop at 1 and an
ac-loop at 2. The move from 1 to 2 is now labeled b+bc, and the move from 2 to
1 is now labeled a+ac.
We next eliminate state 2, which creates two new moves because 2 has one
move in and two moves out. The new three-state GNFA has an ε-move
from 0 to 1, a loop at 1 labeled bc + (b+bc)(ac)*(a+ac), and a move
from 1 to 4 labeled (b+bc)(ac)*.
When we finally eliminate state 1, the single transition in the new GNFA
goes from 0 to 4 and is labeled
[bc + (b+bc)(ac)*(a+ac)]*(b+bc)(AC)*. There
are other possible equivalent regular expressions, of course, and eliminating
the states in a different order will yield one of them if done correctly.
The start state of the DFA is {1}, which I will call "1". On a or c from 1 we
move to the state for the empty set (the nonfinal
"death state") which I will call "0".
On a b from 1 we go to {2,3}, which I will call "23". Of course 0 has moves on
a, b, and c to itself.
State 23 (a final state) has an a-move to 13 (a nonfinal state), a b-move
to 0, and a c-move to 12 (a final state). State 13 has an a-move to 0, a
b-move to 23, and a c-move to 12. State 12 has an a-move to 13, a b-move to
23, and a c-move to 0.
We thus close the process and have a complete DFA with five states, the
nonfinal start state 1, final states 12 and 23, and nonfinal states 0 and 13.
It turns out that this DFA is minimal for its language. We can prove this by
running the minimization algorithm as follows.
We first look at a partition with class F = {12, 23} and N = {0, 1, 13}.
State 12 goes to N on a, F on b, and N on c, while state 23 goes to N on a,
N on b, and F on c. So we must separate the two states of F at the next stage.
Turning to the three states of N, we see that 0 goes to N on all three letters,
1 goes to N on a, F on b, and N on c, and 13 goes to N on a, F on b, and F on c.
The three states have three distinct behaviors, so they must be put in three
different classes at the next stage. Since the next stage has each state in
its own class, it is the last stage and we have shown the DFA to be minimal.
We could make this argument more succinctly by noting that the input b
separates 12 from 23, the input b separates 0 from both 1 and 13, and the input
c separates 1 from 13. So no two final states can be merged, and no two
nonfinal states can be merged, and therefore the DFA is minimal.
We can derive abab by the sequence of moves S → SS → aSbS →
abS → abaSb → abab. By very similar derivations we can make abcd,
cdab, or cdcd.
We can derive aabb by the sequence of moves S → aSb → aaSbb
→ aabb. By very similar derivations we can make acdb, cabd, and ccdd.
The Context-Free Pumping Lemma says that if X is any context-free language,
there exists a positive integer p such that for any string w in X with |w|
≥ p, w can be written as the concatenation of five strings u, v, x, y, z
such that |vxy| ≤ p, |vy| > 0, and for all non-negative integers i, the
string uvixyiz is in X.
If w is apbpcpdp, we can choose
u = ap-1, v = a, x = ε, y = b, and z =
bp-1cpdp. Then for any i,
uvixyiz is
ap-1+ibp-1+icpdp. This is in L(G)
because we can change S to SS, derive the a's and b's from the first S, and
derive the c's and d's from the second S.
Pumping Lemma proof: If L(G) were regular, the conclusion of the Regular
Language Pumping Lemma would hold for some constant p.
Let w be the string apbp. The Lemma tells us that w
can be written as xyz, with |xy| ≤ p, so we know that y consists only of
one or more a's. We are told that xyiz is in L(G), but taking
i=0 gives us a string xz which is in a*b* but not in L(G)
because it has fewer a's than b's. So the conclusion of the Lemma fails for
any valid choice of x, y, and z, and thus the supposition that L(G) was
regular must be false. (Typo corrected 3 March 2009.)
Myhill-Nerode Proof: We claim that the infinite set of strings
{ai: i ≥ 0} is a pairwise L(G)-distinguishable set. Let
x = ai and y = aj, with i ≠ j,
be any two distinct members
of this set. These two strings are L(G) distinguishable because if we take
z to be the string bi, we find that xz is in L(G) and yz is not.
Since there are infinitely many distinct L(G)-equivalence classes, L(G) cannot
be regular.
Proof using closure properties, which I ruled out because it was too easy:
If L(G) were regular, its intersection with any regular language would be
regular. But its intersection with a*b* is the language
{anbn: n ≥ 0}, because we can see that any string in
L(G) has an equal number of a's and b's, and we know that this language is not
regular.
The easiest solution is to construct the top-down parser for G, because you
then do not need to prove correctness. This PDA has states s, q, and f, start
state s, only final state f, and transitions (s,ε, ε;q,S$),
(q,ε,S;q,ε),
(q,ε,S;q,SS),
(q,ε,S;q,aSb),
(q,ε,S;q,cSd),
(q,a,a;q,ε),
(q,b,b;q,ε),
(q,c,c;q,ε),
(q,d,d;q,ε), and (q,ε,$;f,ε).
A simpler PDA also has L(G) as its language, although it takes an argument
to show that this is so. The second PDA M
also has states s, q, and f with start
state s and only final state f. It may push a $ going from s to q and pop the
$ going from q to f, and all its other transitions are from q to q. They are
(q,a,ε;q,a),
(q,b,a;q,ε),
(q,c,ε;q,c),
and (q,d,c;q,ε). How do we show that L(M) = L(G)? First we show
by induction on all strings derivable from S in L(G) that they can be read
during a run of M that starts in state 2 with empty stack and finishes in state
2 with empty stack. These strings, which constitute exactly L(G), are thus all
in L(M) because we can push the $, carry out this run, and pop the $. For the
other direction, we observe that any accepting run of M must contain such a
run from state 2 and empty stack to state 2 and empty stack when we ignore its
first and last moves. And we can show that the string read during any such
run is derivable from S in G, by induction as in the proof that our PDA
constructed from an arbitrary CFG is correct. Any such run either empties its
stack in the middle, in which case it is the concatenation of two shorter such
runs, or it pushes a stack character on its first move and pops the same
character on its last move. In this latter case we can derive its string using
the rule S → aSb or S → cSd, together with the derivation of the
string read between the first and last moves.
Last modified 3 March 2009