Applied Information Theory

CMPSCI 650 • Fall 2014 • Monday and Wednesday, 9:05-10:20 • Location CS Building, Room 142


Erik Learned-Miller

Teaching Assistant:

Cheni Chadowitz
Email: same as mine, but starting with "cheni".

TA Web Site for this class (See section on Teaching.)


Stats 515 or equivalent. Basic college-level probability. Basic programming ability in C++, Matlab, or Java.


As a pdf file.

Reading Materials

• Required text: Elements of Information Theory, Second Edition, by Thomas Cover and Joy Thomas.

• NOTE THAT THIS IS THE SECOND EDITION, NOT THE FIRST EDITION!!! If you use the first edition, all of the necessary material will be there, but the page numbers I give may not be correct. You may want to find a friend who has the correct edition so you can map page numbers.


David Mackay's Information Theory book (electronic edition is free and on the web)

Minimum entropy joint alignment

Mutual Information Alignment

Date Lecture topic New assignments Assignments due Reading
Sept. 3 Introduction and overview. History of information theory. Discrete entropy, conditional entropy. Assignment 1 Posted Due Sept. 17th, 2pm (was originally due on 15th). Chap. 2, Section 2.1-2.3, Cover+Thomas
Sept. 8 Properties of entropy. Conditional entropy. KL-divergence. Mutual information. Chap. 2, Section 2.4-2.6, C+T
Sept. 10 Entropy as minimum average code length. KL-divergence as difference in coding efficiency between codes based on true distribution and codes based on the wrong distribution. Chain rules of information theory. Statistical independence. Mutual information as a measure of statistic independence.      
Sept. 15 Independence bound on entropy. Conditioning reduces entropy. Uniform distribution bound on entropy. Finish material through the end of 2.6 in Cover and Thomas.      
Sept. 17 Differential entropy. Dependence of differential entropy on units of probability density function. Invariance of differential entropy to translation. Continuous versions of KL-divergence and mutual information.     Cover and Thomas Second edition: Sections 8.1, 8.4, 8.5, 8.6
NOTE: In the first edition, this corresponds to Sections 9.1, 9.4, 9.5, 9.6
Sept. 22 Estimating the differential entropy of a one-dimensional distribution from a sample of that distribution: plug-in estimators. Non-parametric density estimates. Leave-one-out estimates of the kernel variance. Assignment 2 Posted  Due October 6, 2pm  
Sept. 24 Finish non-parametric density estimates. The uniformity of the distribution of percentiles, and its relation to spacings.     Wikipedia page on kernel density estimation
Sept. 29 Spacings estimate of entropy and Application 1: Independent Components Analysis.     Nonparametric entropy estimation: an overview
Oct. 1 ICA in two dimensions. ICA in higher dimensions. The RADICAL ICA algorithm. Jacobi rotations. Minimizing the sum of the marginal entropy estimates.     ICA Using Spacings Estimates of Entropy
Oct. 6 Finish Independent Components Analysis.

Start Alignment by the Maximization of Mutual Information. Optimization criteria for aligning images: L2 alignment. Correlation based alignment. When these don't work. The joint distribution of image brightness values in two images.
  Assignment 3 Posted 
Due October 20, 2pm
Oct. 8 Mutual information alignment. The Prokudin-Gorsky images.
Joint image alignment.
Oct. 14 Joint alignment by the minimization of "pixel stack" entropies: congealing.
Lecture slides on joint alignment 
Oct. 15 The Pythagorean theorem of information theory. The best factored approximation of a joint distribution is the product of the marginals.

The Chao-Liu algorithm: Finding the best tree-structured distribution to approximate a joint distribution: Calculate mutual information between all pairs of variables. Use maximum spanning tree algorithm over random variables, using mutual information as the weight on edges.
Assignment 4 Posted

Prokudin-Gorsky plates
Due October 27, 2pm  
Oct. 20 NOTE: Material prior to this lecture will be on mid-term.

Source coding. Compression. Optimal codes. The Asymptotic Equipartition Property. Start discussion of typicality.
    Chapter 5, Cover and Thomas
Oct. 22 Typicality and properties of the typical set. Source coding based on the typical set.     Exam 1 review sheet 
Oct. 27 Kraft Inequality. Upper and lower bounds on optimal expected code lengths. Using the "wrong code".      
Oct. 29 IN CLASS MIDTERM.      
Nov. 3 Finish source coding. Using the wrong code. Minimum discription length as another route to understanding probability distributions. Unsupervised parsing of natural language streams.

Start channel coding.
Nov. 5 Continue channel coding. Binary symmetric channel. Channel capacity. Channel coding theorem. Erasure channel. Computing the capacity of a channel. Assignment 5 Due, Monday, November 24, at 2pm.   Chapter 7, sections 7.1-7.6 in Cover and Thomas
Nov. 10 Rate of a code. (M,n) codes and (S^nR,n) sequences of codes.      
Nov. 17 Joint Typicality      
Nov. 19 The Channel Coding Theorem: Part 1.      
Nov. 24 The Channel Coding Theorem: Part 2. (Note, I will not cover the converse in class.)      
Dec. 1 Hamming codes. Linear codes and algebra on the binary field. Assignment 6 Due, Friday, Dec. 5, 11:59 pm.    
Dec. 3 Finish practical channel coding. Review for Final