17: Email Investigations


Midterm exam 2 is next class Thursday. Unlike the first midterm, there will be a separate exam for the 590F students, differentiated by color.

Email investigations

It turns out that despite being an aged method of communication, email is still quite relevant to individuals and society, It is one of the original open protocols of the Internet, and many Old People like me still use it as our primary form of professional communication.

Given how email has been in the news lately (Wikileaks, Clinton, Weiner, etc.) I though we could spend a little time talking about what you can and can't learn from emails.

Email on the Internet

Email is actually a surprisingly complicated set of protocols. We're going to talk about the high level stuff today then look at some examples of emails to show what you can and can't learn from them.

There are three major components to consider: user agents (which are the client-side part of email and analogous to web browsers, though not quite), mail servers, and the SMTP protocol.

(on board)

Mail servers are at the core of email infrastructure. Every recipient has a mailbox located at a mail server. To send a message, a sender creates a message in their user agent. The user agent relays it to the sender's mail server; the mail server relays it to the recipient's mail server, where it is deposited in the recipient's mailbox. The recipient eventually accesses their mailbox with their own user agent and retrieves and/or reads the message.

SMTP (Simple Mail Transport Protocol) is the application-layer protocol spoken among mail servers. It runs atop TCP, and these days, is often wrapped in TLS. Sometimes but not always your user agent also speaks SMTP to send (though not receive) messages from your mail server. SMTP is old (though not as old as me), dating back to 1982, which entails certain legacy issues (for example, only 7-bit ASCII characters are allowed in messages; others must be encoded in various ways, MIME anyone?)

Sending messages

If you want to send a message, your user agent constructs a file in the proper format. Minimally, it consists of a set of headers and an optional message body, for example:

Subject: Re: Switching my major into CS
From: Marc Liberatore <liberato@cs.umass.edu>
Date: Mon, 3 Apr 2017 19:24:02 -0400
To: A Student <a.student@umass.edu>

RFC 5322 specifies (part of) the format of emails; RFC 2821 specifies (part of) SMTP.

Your user agent contacts your mail server, authenticates (if required: in the olden days it was not(!!!)), and accepts the message into a queue. It may add header lines to the email, though generally it will not modify existing header lines.

Your mail server will then contact the destination mail server, which if all goes well will accept the message, again possibly adding its own new header lines before placing it in the user's mailbox. Notably, certain headers might be purged at this step, like the BCC: header.

(Why does a destination mail server accept from any mailer on the Internet? They actually usually don't, and do filtering either based on white/blacklists, or DNS records, or throttle, and so on.)

Finally, the recipient will (eventually) open their mailbox using their user agent, and view or download the message.

User agents generally fall into one of two categories these days. Either they're a traditional mail client (like mutt, Apple Mail, or Mozilla Thunderbird) that uses SMTP (to send) + IMAP (to read/manipulate mailbox) or they are a web application where the communication with the mail server happens on the backend (the web server talks to the mail server, over either SMTP + IMAP or custom protocols).

Eventually, your email ends up on Hillary Clinton's private server and is the subject of much media hullaballo (just kidding).

What can we learn from emails?

Like any electronic evidence, we need to consider the chain of custody and the source. A single email devoid of context doesn't tell us much, as we have no guarantee of its provenance. An email stored on a server with a history of logs and the like is more reliable, though again, consider the source. Email is a simple, text-based format with no built-in integrity checks or cryptographic signatures. Forging them is trivial, and just because an incriminating email is found among legitimate ones does not mean it is legitimate. For example, I'd trust, say, anonymously "found" email troves uploaded to Wikileaks far less than emails on a server I administered and knew to be relatively secure and auditable.

That said, what can we learn?

Most of what we can learn is derived from the headers of emails. At all times you should be aware of the provenance of the emails, and have a sense of which mail servers and user agents (and user) hands that an email passed through. That said, what can you learn if you take things at face value?

Some fields are well-defined and under user control. For example, the To: and CC: fields are usually pretty reliable. From: needn't be true, though most legitimate mail servers will not allow users to spoof the From: address. Date: usually corresponds to when the message was generated. Other fields are server- or mail-agent-specific.

Let's look at a few examples from my own emails.

(on projector)

Now let's look at a few more "interesting" emails:


Exam review


  • NITS
  • Partitions
  • FAT layout
  • FAT allocation / deletion
  • NTFS (high level only, big guns come on final)
  • Email

Expect the most detailed questions on topics we've already had homeworks on; expect fewer (or no) questions that are higher-level on topics we have not had homeworks on.