Discrete Mathematics: A Foundation for Computer Science

by David Mix Barrington

Preface for the Instructor

Overview and Motivation

Most computer science undergraduate programs have a required course in ``Discrete Mathematics''. Such a course generally has two sometimes competing goals: To develop the mathematical maturity required to understand and construct the formal arguments in more advanced computer science, and to cover the specific material needed in later courses (particularly theory courses), usually some combination of logic, set theory, number theory, combinatorics, probability, and graph theory.

Most instructors of my acquaintance consider this to be a ``problem course'' in the curriculum. The students tend not to like it because they would rather be learning programming than mathematical thinking --- it is unmotivated with respect to the rest of computer science. This is often exacerbated by the cultural gulf between an instructor who is a mathematician and students who want to be computer scientists, especially if it is taught as a service course by a mathematics department. A further problem is that the usual organization of the course is disjointed --- with so many different topics (``If it's Tuesday, this must be graph theory...'') it hasn't any single theme.

There is an obvious way to solve both these problems. Choose some unarguably computer science topic which involves so many of the standard discrete mathematics subject areas that a discrete mathematics course can be built around it. If this works, the computer science students will be more involved and the course will have a unifying theme. This is not a new idea, and several books have tried this with several computer science themes, but this book represents a substantially different approach that may be particularly suited to computer science programs.

This book offers three different ways to tell a complete story about computation that begins with the fundamentals of discrete mathematics:

Traditionally, discrete mathematics is a gateway to the theory of algorithms, particularly of algorithms that operate on graphs and trees, the subject of our Chapters 8 and 9. General classes of real-world problems can be modeled as problems about graphs, given the vocabulary and methods of logic and induction introduced in the early chapters. Trees can be used to model and prove properties of general search algorithms , such as depth-first, breadth-first, uniform-cost, and A^* searches, studied both in algorithms and artificial intelligence courses.
Several fundamental concepts in computer science are united by their dependence on the notion of discrete probability. We introduce probability in Chapter 10, using the basic combinatorics of Chapter 6, and then offer three application domains for it in computer science: Probabilistic reasoning, as in Bayes' Theorem, in Chapter 11, Markov processes in Chapter 12, and information theory in Chapter 13. One of these topics could be used as the culmination of an ordinary discrete mathematics course, or two or three could be used in a course that centered around the reasoning techniques needed to deal with uncertainty.
Finally, formal language theory, especially the theory of finite-state machines and regular languages, is a central idea of computer science that uses most or all of the basic techniques of discrete mathematics. For example, the Myhill-Nerode Theorem (characterizing formal languages decidable by finite-state machines and determining the minimum number of states needed), and Kleene's Theorem (that finite-state decidable and regular languages are the same) require the students to master inductive proof (on general structures, not just integers), equivalence relations, paths in directed graphs, and the variety of operations on strings and languages. Chapter 14 presents these two theorems in detail and Chapter 15 is a brief introduction to other formal language theory topics, particularly useful in a terminal treatment of the area.

Any of these course structures leads to a course that deals with computation constantly, but develops mathematical sophistication by leading the students all the way from the basic notion of proof to some significant real mathematics.

Highlights of the Approach

The book develops a familiarity with mathematical proof and the mathematical method in a comprehensive way. The model at the beginning is of a student bored or confused by any previous attempts to convey the notion of proof, but willing to pay some attention now that he or she has met this notion in a required computer science course. By the end of the course the student is meant to have mastered a significant chunk of real mathematics, one central to the mathematical study of computation.
The context of this material is always that of computer science. A few examples: The definition of mathematical structures builds on the familiar concept of a data type, examples of algorithms and some programming exercises are given in Java, logic is presented as a tool to demonstrate the correctness of programs, and finite-state machines are presented as an abstraction of lexical analyzers.
Each of the three advanced areas (graphs/trees/algorithms, probaility and applications, and finite-state machines) is treated in some depth, serving as a fully-developed example of the techniques of discrete mathematics. As the instructor chooses, each can serve as grounding for a future course or as a terminal treatment of the subject for some or all of the students.
The book is lean but flexible. Most courses using it should contain the core of the material, from basic logics (Chapters 1-2), basic number theory (Sections 3.1-3.6) and general induction proofs (Chapter 4). From that point on, however, the instructor may choose one or move of the central themes outlined above, and cover as much or as little of the other subjects as she likes. Of course there has been no effort to cover all worthwhile topics in discrete mathematics, and the treatment of each topic reflects the author's own biases and idiosyncracies. But this is arguably a feature rather than a bug!
The book tries to support teaching methods other than the standard lecture format. When this course is taught at UMass, one of the four hours per week is devoted to an active learning session, in which a new concept is presented and discussed, and students carry out some sort of short written exercise during the class period with the instructor there. There are several indicated topics in each chapter which might effectively be dealt with in this way, including both some exercises which fit into the regular flow of the sections and the special ``Excursion'' sections which do not.

Prerequisites

This book is an outgrowth of the author's development of two particular courses, Computer Science 240 and 250, at the University of Massachusetts Amherst. CMPSCI 250, Introduction to Computation, has corequisites of second-semester calculus and data structures (second-semester programming) and CMPSCI 240, Reasoning About Uncertainty, has both these courses as prerequisites. The math prerequisite of the former, at least, is not justified by any calculus content in most chapters of the book beyond a few throwaway references, but serves to ensure some minimum level of skill in calculation. (Chapters 7-11 make more direct reference to calculus.) The programming prerequisite is more important, because much of the material refers specifically to constructs in a modern imperative and object-oriented programming language, including recursion. The programming examples and exercises refer specifically to Java, but they primarily deal with aspects of Java that are not terribly different in C or C++.

The primary target audience of this book is sophomore and junior computer science majors, though it should be usable in a wider range of settings. It has been used at the University of Massachusetts for a 14-week, 4-credit one-semester course, Computer Science 250. This course in Spring 2006, for example, used 53 of the 1200 lecture sections in the book, including a core of 1, 2, 3.1-3.6, 4, 5.1-5.5, 6.1-6.7,and 14, with various different additional sections. Computer Science 240 in Spring 2008 used nearly all of Chapters 6, 10, 11, and 12.

This book is intended to be a primary text for a course. Two supplementary texts are recommended: How to Read and Do Proofs by Solow for those needing additional familiarization with the concept of proof, and Godel, Escher, Bach: An Eternal Golden Braid by Hofstadter, both for its philosophical examinations of computer science as a whole (and artificial intelligence in particular) and for its more detailed treatment of first-order number theory. Footnotes in this text refer to Hofstadter and Solow where appropriate, but the presentation here is intended to be self-contained.

Guide for Selection of Sections

Each of the fifteen chapters has eight ordinary sections and three Excursions. An excursion is a suggested tangential or supplementary topic, related to the main lecture material in the sections. At UMass, the class using this book meets for four 50-minute classes per week, three regular lecture sections and one ``discussion section'' with the instructor. In the latter the students carry out the writing assignment from an Excursion with the supervision and intervention of the instructor as necessary. Most Excursion writing assignments would also be suitable for in-class or take-home group projects.

A 50-minute lecture will usually cover one but sometimes two ordinary sections, depending on the level of sophistication of the students and their familiarity with the material. Usually there is more mathematical material in the book than could be covered in that time. Some sections may demand more than one lecture, depending on the instructor's taste, and some can be omitted. In general the instructor can decide to do a few, most, or all of the sections in each chapter, following the specific guidelines below.

Chapter 1 begins the treatment of the language of mathematics with the two most fundamental concepts, sets and propositions, and the relationship between them. Proofs in the propositional calculus are treated in some detail with many examples. The chapter concludes with the introduction of predicates and practice in translating between symbolic language and English. All this material is necessary, but more sophisticated students should be able to pass through it more quickly. Of the three Excursions, 1.3 is a more philosophical introduction to the course while 1.9 and 1.11 are direct practice in important skills.

Chapter 2 introduces the predicate calculus and applies it to properties of relations. Students learn what quantifiers are, what they mean, how to prove quantified statements, and how they are used to define several important concepts. All of this chapter is necessary though section 2.5 can be postponed if the regular language material of Chapter 5 is to be done later. The last two sections each contain a substantial proof. Section 2.10 proves that Hasse diagrams provide a general notation for finite partial orders, and 2.11 relates equivalence relations to partitions. The first proof is not directly used later, but gives a good early example of some real mathematics. The second proof should definitely be included. Of the three Excursions, 2.2 is a Java-based tangent while 2.4 and 2.7 provide important practice.

Chapter 3 deals with number theory without using induction. Many instructors with mathematical training tend to overestimate students' familiarity with ideas like prime numbers. Though the material is not heavily used later, this chapter can go a long way to introduce students to mathematical culture. Sections 3.1-3.6 show why unique factorization into primes and the Chinese Remainder Theorem are true, from first principles. Sections 3.7-3.11 are more sophisticated (except for the basic cryptography in 3.11) but can be very well motivated for students wanting some understanding of, for example, public-key cryptography. The Excursions are rather diverse: 3.2 offers tricks for hand calculation of divisibility, 3.7 explores a "hard problem" from Hofstadter, and 3.10 applies the algebraic machinery developed for RSA to look at a question in computational complexity.

Chapter 4, on induction, is in many ways the core of the book. It is shown how induction is part of the definition of the system of natural numbers, and works in any system with a similar definition. The relationships among inductive definition, inductive proof, and recursive algorithms are stressed throughout. The Excursions are fairly central to the argument. The last sections, 4.9-4.11, provide a very brief introduction to graph theory. They could substitute for Chapter 8 if that is to be entirely omitted, or be skipped if graph theory is to be covered elsewhere in the curriculum.

Chapter 5 applies induction and recursive definition to regular languages (5.1-5.5) and various other structures. The first five sections are necessary for Chapter 9, but the last six are each optional. 5.10 and 5.11 introduce the application of discrete mathematical reasoning to actual computer programs --- for example 5.10 proves that any program with IFs and GOTOs can be simulated by a program of WHILEs and IF-THEN-ELSEs.

Chapter 6 covers the basic principles of combinatorics at a rather relaxed pace, so that this material can be taken slowly (one section per lecture) or quickly (two sections per lecture). One section is devoted to each of the four basic counting problems, counting sequences with or without order and with or without replacement. Excursion 6.5 revisits the sorting problem, familiar from a data structures course, and relates it to counting permutations. Excursion 6.8 gives some practical hints on generating lists of permutations and combinations with a computer. Finally, 6.9-6.11 explore the idea of combinatorial proof that is central to further work in combinatorics.

Chapter 7 explores three further topics in combinatorics, the analysis of recurrences (7.1-7.4), asymptotic notation (7.5-7.7), and infinite sets (7.8-7.11). The first two are important preparation for a future course in the analysis of algorithms, though different curricula will place different demands on the discrete mathematics course in these areas. The last topic is of little direct importance but introduces the diagonal argument that recurs in computability theory (for example, in our Chapter 15). Each of these three topics has an Excursion.

Chapter 8 on graph theory can be used flexibly. Sections 8.1-8.3 cover graphs, their representation by adjacency matrices, and the theorem relating paths in graph to matrix multiplication. These sections are very useful, though not absolutely required, for Chapter 14. Sections 8.4 and 8.5 look at further implications of the matrix/path theorem. Finally, Sections 8.6-8.11 cover the classical topics of graph theory (such as coloring and planarity) and an instructor may pick and choose freely among them.

Chapter 9 deals with trees primarily as a model for search algorithms, after a review of the definitions and induction techniques (9.1-9.3). Breadth-first and depth-first search are treated in two ways -- as general search algorithms as in AI (9.5) and in the special case where visited states can be marked (9.6), as in an algorithms course. The chapter concludes with a look at two search algorithms common in AI (9.8, 9.9) and a brief look at adversary search (9.10, 9.11).

Chapter 10 is an introduction to discrete probability and lays the groundwork for Chapters 11, 12, and 13. A minimal treatment of the area could be obtained with just the sections on definitions and expected value (10.1-10.4), concluding with an Excursion computing the winning probability in Craps, a dice game. The middle of the chapter (10.5-10.7) explains variance and standard deviation, explaining how results from continuous probability may be used to solve discrete problems. Finally, the remaining sections (10.8-10.11) present some more advanced methods to bound the probability of particular events based on analysis of a discrete distribution.

Chapter 11 is a more advanced treatment of reasoning techniques seen in AI. It begins with a thorough treatment of Bayes' Theorem (11.1-11.4) and its use to update estimates of probability based on new evidence. It then considers the Naive Bayes Classifier, a simple learning algorithm used, for example, in spam filters (11.5-11.8). Finally, simulation is considered as an alternate method of estimating probabilities (11.9-11.11).

Chapter 12 deals with Markov chains, defining them and arguing that they generally reach a steady-state distribution (12.1-12.4). Markov decision processes are then considered as an example of how to act in a situation where the results depend both on the decisions and on random events (12.5-12.8). The chapter concludes with the classical theory of two-player simultaneous-move games, where optimal strategies are usually probabilistic (12.9-12.11).

Chapter 13 considers information theory, a fundamental topic for computer science that is not usually considered in lower-level courses. The initial treatment is Sections 13.1-13.5 gives useful background for coding and data compression. The remaining sections are more advanced, introducing the concept of entropy and exploring the limits of representing information streams by coding.

Chapter 14 is a thorough treatment of finite-state machines, proving the Myhill-Nerode Theorem (14.2-14.3) and Kleene's Theorem (equivalence of FSM's and regular languages, 14.5-14.10). I would include the entire chapter, except possibly for the somewhat tangential Excursions 14.4 and 14.11, in any course that culminates with this topic. Excursion 14.9 works an example of most of the constructions in Kleene's Theorem and is fairly important.

Chapter 15 is a brief look at topics in computability and complexity theory beyond finite-state machines. Since a course based on this book might be the last course in the theory of computation for many students, an instructor might want to say a little about one or more of grammars and context-free languages (15.2-15.5), general computability (15.6-15.10), or complexity (15.11). This chapter is no substitute for a follow-on course, naturally, but it might also serve to whet student's interest. Section 15.1 on two-way automata presents a self-contained result that is somewhat neglected and not directly relevant to what follows. But it is fairly simple to do after the Myhill-Nerode Theorem, and is a nice accessible example of idea of an abstract model, which is central to the rest of the chapter.

Exercises and Problems

Each ordinary section contains at least five Exercises and at least five Problems. The former are intended to be direct applications of concepts in the section, suitable generally more for student review than for graded homework. The Problems are more sophisticated, sometimes laying the groundwork for concepts to be encountered later. Some Exercises and some Problems involve the use of Java, and this is indicated at the beginning of each. Each Excursion contains one or more suggested Writing Exercises. Instructors should consider these carefully to decide how to use them.

Chapter Glossaries

Every new term is indicated in bold face when it is first introduced in the text. At the end of each chapter is an index giving the page number for each of these defined terms. The index provides a useful study guide to the student, since one main goal of the course is the acquisition of new vocabulary.

Author Information

I am by training a mathematician, dabbling in topology and set theory in graduate school at M.I.T. before settling on computational complexity theory. My thesis, under the direction of Mike Sipser, was a characterization of the computing power of constant-width branching programs --- I found a way in which polynomial-size constant-width programs could compute things previously believed to be beyond them. My subsequent research (with several collaborators) has uncovered surprising connections between complexity theory and some of the subjects of this book: finite-state machines and first-order logic.

In 1986 I joined the computer science faculty of the University of Massachusetts at Amherst, a large research university where the undergraduates range widely in ability. I've developed a strong interest in and affection for undergraduate teaching, particularly the task of motivating computer science students to think ``like mathematicians'' to the extent necessary for later work in their own field. I have come to know our students and their problems very well, as both a teacher and academic advisor. Discrete mathematics became my favorite course to teach, as I have offered it several times at UMass and once each (to different student populations) at the University of Washington and at Mount Holyoke College.

This particular project originally arose from an overhaul of our undergraduate curriculum, in which we decided that our discrete mathematics course needed to come earlier in the sequence, be oriented toward computer science, and yet be more mathematically rigorous than its predecessor. With the help of a Lilly Teaching Fellowship, I spent the academic year of 1994-95 both developing this particular course and experimenting with innovative teaching methods.

More recently we have revisited the undergraduate curriculum and decided to split "discrete mathematics" into two courses, including as well material on general search and probabilistic reasoning that had previously been deferred to the upper-level AI course. Chapters 9-13 are one result of my course development efforts in this regard.

Last modified 29 December 2008