16: Email, THD
DRAFT
Announcements
Exam 2 in class next Thursday! Please be on time. All topics since the last exam are fair game (along with whatever that pulls in from before). Please answer the Piazza poll re: multiple exams, especially if you have multiple exams! I am trying to figure out how best to accommodate people that are overloaded.
I’m updating the lecture notes / schedule to more accurately reflect what we’ve been covering over the last two week. This should be done tonight / tomorrow morning.
I’m going to move the A10 deadline. What was I thinking?!?!?
Email investigations
New topic!
It turns out that despite being an aged method of communication, email is still quite relevant to individuals and society, It is one of the original open protocols of the Internet, and many Old People like me still use it as our primary form of professional communication.
Given how email has been in the news over the last couple years (Wikileaks, Clinton, Weiner, etc.) I though we could spend a little time talking about what you can and can’t learn from emails.
Email on the Internet
Email is actually a surprisingly complicated set of protocols. We’re going to talk about the high level stuff today then look at some examples of emails to show what you can and can’t learn from them.
There are three major components to consider: user agents (which are the client-side part of email and analogous to web browsers, though not quite), mail servers, and the SMTP protocol.
(on board)
Mail servers are at the core of email infrastructure. Every recipient has a mailbox located at a mail server. To send a message, a sender creates a message in their user agent. The user agent relays it to the sender’s mail server; the mail server relays it to the recipient’s mail server, where it is deposited in the recipient’s mailbox. The recipient eventually accesses their mailbox with their own user agent and retrieves and/or reads the message.
SMTP (Simple Mail Transport Protocol) is the application-layer protocol spoken among mail servers. It runs atop TCP, and these days, is often wrapped in TLS. Sometimes but not always your user agent also speaks SMTP to send (though not receive) messages from your mail server. SMTP is old (though not as old as me), dating back to 1982, which entails certain legacy issues (for example, only 7-bit ASCII characters are allowed in messages; others must be encoded in various ways, MIME anyone?)
Sending messages
If you want to send a message, your user agent constructs a file in the proper format. Minimally, it consists of a set of headers and an optional message body, for example:
Subject: Re: Switching my major into CS
From: Marc Liberatore <liberato@cs.umass.edu>
Date: Mon, 3 Apr 2017 19:24:02 -0400
To: A Student <a.student@umass.edu>
RFC 5322 specifies (part of) the format of emails; RFC 2821 specifies (part of) SMTP.
Your user agent contacts your mail server, authenticates (if required: in the olden days it was not(!!!)), and accepts the message into a queue. It may add header lines to the email, though generally it will not modify existing header lines.
Your mail server will then contact the destination mail server, which if all goes well will accept the message, again possibly adding its own new header lines before placing it in the user’s mailbox. Notably, certain headers might be purged at this step, like the BCC: header.
(Why does a destination mail server accept from any mailer on the Internet? They actually usually don’t, and do filtering either based on white/blacklists, or DNS records, or throttle, and so on.)
Finally, the recipient will (eventually) open their mailbox using their user agent, and view or download the message.
User agents generally fall into one of two categories these days. Either they’re a traditional mail client (like mutt, Apple Mail, or Mozilla Thunderbird) that uses SMTP (to send) + IMAP (to read/manipulate mailbox) or they are a web application where the communication with the mail server happens on the backend (the web server talks to the mail server, over either SMTP + IMAP or custom protocols).
Eventually, your email ends up on Hillary Clinton’s private server and is the subject of much media hullaballo (just kidding).
What can we learn from emails?
Like any electronic evidence, we need to consider the chain of custody and the source. A single email devoid of context doesn’t tell us much, as we have no guarantee of its provenance. An email stored on a server with a history of logs and the like is more reliable, though again, consider the source. Email is a simple, text-based format with no built-in integrity checks or cryptographic signatures. Forging them is trivial, and just because an incriminating email is found among legitimate ones does not mean it is legitimate. For example, I’d trust, say, anonymously “found” email troves uploaded to Wikileaks far less than emails on a server I administered and knew to be relatively secure and auditable.
That said, what can we learn?
Most of what we can learn is derived from the headers of emails. At all times you should be aware of the provenance of the emails, and have a sense of which mail servers and user agents (and user) hands that an email passed through. That said, what can you learn if you take things at face value?
Some fields are well-defined and under user control. For example, the To: and CC: fields are usually pretty reliable. From: needn’t be true, though most legitimate mail servers will not allow users to spoof the From: address. Date: usually corresponds to when the message was generated. Other fields are server- or mail-agent-specific.
Let’s look at a few examples from my own emails.
(on projector)
Now let’s look at a few more “interesting” emails:
(Note that if you are concerned about sketchy websites you might not want to click on the following)
https://wikileaks.org/dnc-emails/
The Trojan Horse Defense
A technological variant of the SODDI defense, where the defendant in a computer crime case claims that their computer was under the control of malware when “it” committed a crime.
Notable case occurred on September 20, 2001 (near a significant date in US history) when allegedly Aaron Caffrey engaged in a DoS on a webserver in Port of Houston. He claimed the evidence
was planted on his machine by attackers who used an unspecified Trojan [horse program] to gain control of his PC and launch the assault.
A forensic examination of Caffrey’s PC found attack tools but no trace of Trojan infection.
He was acquitted after only a few hours deliberation (after essentially acting as his own expert witness), after the jury was convinced that
a [T]rojan horse armed with a ‘wiping tool’ was responsible, enabling the computer to launch the DoS attack, edit the system’s log files, and then delete all traces of the trojan — despite prosecution claims that no such technology existed.
Several other cases:
- UK defendant Julian Green acquitted in CP case, based upon expert testimony that 11 trojans were on his computer. (2003)
- UK prosecutors dismissed charges against Karl Schofield after finding a trojan on his computer. (2003)
- In the US, Eugene Green (accountant) successfully claimed that a virus caused the $630k underreporting of his own income that prosecutors charged him with. Acquitted (despite the fact the virus apparently never modified returns he prepared for others).
Trojans present a unique challenge to prosecution in the US, as the “beyond a reasonable doubt” standard can be difficult to meet, especially if the defense can raise the possiblity of a computer being remote-controlled by SOD.
SODDI is often rebutted by motive and lack of alternative suspects (but seriously, Green?), but sometimes this can be difficult when the crime is less personalized.
Legal issues for the THD
First, note it’s potentially a real thing: as evidence, look at botnets and extortionware (“your files are encrypted, send BTC to this address to get the decryption key”).
The THD can work as follows:
- The defendant can claim no knowledge of the crime. Reasonable doubt is raised because SODDI.
- The defendant can claim he may have committed the actions (actus reus) of the crime, but done so unknowingly (mens rea). To convict for many crimes, prosecution must show both. (Green the accountant used this version of the THD).
- In either case, the defense usually also claims technological naivete, so that the traces of trojans, etc., found by expert witnesses support their claims.
The prosecution’s response will generally be as follows:
First, if possible, establish the defendant’s technological expertise, to cast doubt on possibility of infection. The effectiveness of this approach varies. More sophistication (+ more evidence of a secure computer) along with less evidence of malware (“wiped all the logs”) makes the THD less reasonable.
Second, negate the factual basis of the defense. Thoroughly examine the drives, looking for malware. If found, determine if its capabilities permit the alleged crime. If not, look for evidence of wiping or wiping tools.
Finally, seek confessions in interrogation or before. As noted by Brenner et al., suspects often confess, and these confessions are binding (assuming the suspect has been Mirandized appropriately). Some questions can be asked that rule out (or make more difficult) the THD or SODDI: “Who else has access to this computer?” “Do you use antivirus programs?” Etc.
Forensic examinations also help in other ways. If the main activity (by log exam, etc.) the computer was used for was, say, CP collection, it becomes harder for the defendant to claim THD – why was the computer in your house and turned on if you never used it? Similarly the case for well-organized collections of CP.