CMPSCI 250 Discussion #7: Designing Regular Expressions

David Mix Barrington

3/5 April 2006

The goal was to produce a regular expression for the language EE of strings over {a,b} with both an even number of a's and an even number of b's. It was suggested that you start with the language EEP, which is the set of nonempty strings in EE that cannot be broken up into two smaller nonempty strings in EE. Then you can argue that EE = (EEP)*.

Solution: The only strings of length two in EE are aa and bb, and since these cannot be broken down further they are both in EEP.

Now we look at strings of length four in EE (there are no strings in EE of odd length). The ones that start with aa or bb cannot be in EEP because the initial aa or bb could be broken off and the remainder would also be in EE. So we must start with ab or ba. Then for the whole string to be in EE, we must end in ab or ba as well, giving us the four strings abab, abba, baab, and baba.

With length six, we must start with ab or ba for the reason above, and we must end in ab or ba as well because if we ended with aa or bb we could break off the last two letters and get two nonempty strings in EE. In the middle of the string, we can only have aa or bb, because the first time we have an ab or a ba the string up to that point is in EE, and thus the whole string breaks into two pieces. This reasoning works for any larger even length -- we begin with ab or ba, have aa's and bb's in the middle, then end in ab or ba.

A member of EEP, then, must be either (1) aa or bb, or (2) be ab or ba, followed by zero or more strings that are either aa or bb, followed by ab or ba. As a regular expresssion, this language is:

aa + bb + (ab + ba)(aa + bb)*(ab + ba)

Now we have to prove that EE = (EEP)*. This means that EE is:

(aa + bb + (ab + ba)(aa + bb)*(ab + ba))*

We begin by showing that (EEP)* is a subset of EE. As in Section 5.4, we can do this by induction on all strings in the star language. If w is in (EEP)*, then either w is λ or w = uv where u is a shorter string in (EEP)* and v is in EEP. We want to prove that all such strings are in EE. If w is λ then w is in EE because λ has both an even number of a's and an even number of b's (0 is an even number). For the inductive case, let w = uv and assume by the IH that u is in EE. Since v is in EEP, it is also in EE. And EE is closed under concatenation -- since u and v each have both an even number of a's and an even number of b's, w does as well as its numbers of a's and b's are the sum of those in u and v.

Now we have to prove that any string in EE is also in (EEP)*. I will do this by strong induction on the length of strings, so that my predicate P(n) is "if w has length n or less, and w is in EE, then w is in (EEP)*". We prove P(0) by noting that λ is a member of any star language. Then we assume P(n) and try to prove P(n+2). Let w be an arbitrary string of length n+2 that is in EE.

Case 1: w cannot be broken up as uv with both u and v nonempty strings in EE. Then w is in EEP by definition, so it is also in (EEP)*.
Case 2: w can be broken up into u and v this way. Then both u and v have length in the range from 2 through n. Because both u and v are in EE and P(n) is assumed to be true, we know that both u and v are in (EEP)*. And the concatenation of two strings in any star language is also in that star language -- this is proved in Section 5.4 of the book.

Last modified 11 April 2006