16: More on "Going Dark", Email

DRAFT

Announcements

Exam 2 in class on Tuesday! Please be on time. All topics since the last exam are fair game (along with whatever that pulls in from before).

From last class

The 4A requires that law enforcement (executive branch) go to a court (judicial branch, so powers are separate and presumably checked), swear out a warrant application (perjury for lying), showing probable cause (not proof, but “where the facts and circumstances within the officers’ knowledge, and of which they have reasonably trustworthy information, are sufficient in themselves to warrant a belief by a man of reasonable caution that a crime is being committed” (Brinegar v US)) and particularly naming the person/place to be searched and thing to be seized.

Mandated access

CALEA (Communications Assistance for Law Enforcement Act) requires telecom providers to be able to assist law enforcement. It’s a law that requires certain powers, in other words:

The Act obliges telecommunications companies to make it possible for law enforcement agencies to tap any phone conversations carried out over its networks, as well as making call detail records available. The act stipulates that it must not be possible for a person to detect that his or her conversation is being monitored by the respective government agency.

For communications that pass through centralized telecom companies, CALEA more or less does its job, at least at the network level. Switches that handle voice communications have intercept capability built-in. (And this shouldn’t be surprising. Think of the history of telecoms: https://en.wikipedia.org/wiki/Switchboard_operator)

Routers carrying IP traffic must also comply (by ruling of the FCC), as must centralized VOIP providers. IP routers “delegate the CALEA function to elements dedicated to inspecting and intercepting traffic. In such cases, hardware taps or switch/router mirror-ports are employed to deliver copies of all of a network’s data to dedicated IP probes.”

The problem

Once the capability for monitoring exists, it can be used, and not always lawfully, fr example, the Greek Watergate affair: https://en.wikipedia.org/wiki/Greek_wiretapping_case_2004%E2%80%9305

Bellovin et al note that CALEA-like interfaces are problematic because they are designed specifically for surreptitious eavesdropping, unlike more typical network monitoring that logs and alerts.

They argue that broadly requiring CALEA-like interfaces on not just network-level protocols but all application-layer protocols is a recipe for disaster. Let’s go over their argument.

Because the Internet is (or can be) decentralized (any computer can act as a server), they argue that wiretapping capability would need to be widely distributed. In some sense it already is, at the IP layer, but their argument appears to be that any server provider not just provider of Internet connectivity, would need to comply with CALEA. They assert this is “untenable.” Specifically, they point out that Internet startups are diverse and dynamic, and that forcing them to integrate a complex wiretap protocol over “quickly deployed and poorly debugged services” would be an expensive burden on small companies.

P2P, they similarly argue, doesn’t accommodate the CALEA model, as there is no centralized entity to regulate (though one wonders about how true this is, given how effectively some p2p systems have been shut down in the past when the company that developed the main version of software was shut down: see https://en.wikipedia.org/wiki/LimeWire for example.)

Their best argument, IMHO, is that expanding the number of CALEA-like interfaces in the network would create great insecurity. The vulnerabilities in every CALEA-compliant switch tested by the NSA show how hard it is to get the interception technology correct.

The proposed solution

Bellovin et al. argue that the FBI should instead leverage the “essentially unlimited number of security vulnerabilities” in modern computing and communication devices.

General criminal compromise of computers is wide-ranging and non-specific. Unpatched computers (more typically: services) might be remotely exploitable or not, and criminals do not typically target their attempts at compromise (depends on the criminal and goal, of course).

LE tools must be targeted. They must be likely to succeed. They must not disrupt services (of target or others). And they must be manageable: it should be easy to check if the tool worked, be able to control it during monitoring, and be able to clean up when done.

Four primary components:

selection or discovery of vulnerability
installation
obtain access after installation
obtain communications

And, how reliable is the data that’s gathered? Judges must believe that the tool gathers only and exactly what is cited in the warrant. Tools that undercollect might miss exculpatory evidence, and overcollection will violate the warrant’s limits. (This differs from minimization, where LE must make take reasonable steps to ensure collecting only communications of the subject, and only when they are committing a crime.)

We’ve seen this already in NITS, where FBI discovered and used an exploit that they served to the Tor browser to recover information from users. More generally, any computer connected to the internet that might be remotely exploitable could be monitored in this way. Though discovering 0days might be hard.

Finding vulnerabilities

One could imagine a LE lab that was federally funded to find LE-grade exploits in apps and platforms (e.g., Windows or iPhone). Or LE could purchase exploits on the vulnerability market (though clearly there are ethical issues here – are they creating or encouraging an illegal market?).

Vulnerabilities remain useful until they are disclosed, patched, and the patch is deployed widely. (How long is that time period? Who knows?) There is also the issue, as we’ve see in NITS, that the defense might want the exploit for whatever reason, and by giving it to them, LE will lose access to a tool.

Relatedly, should the government disclose vulnerabilities after a set period of time (maybe even 0 time)? The ethical calculus is not clear here: having a tool to do lawful wiretapping, vs pre-empting crime that might be committed using the vulnerability by third parties.

Finally, this lab would be a nightmare to secure: what a rich target! Not unlike NSA/CIA/etc., it would almost certainly need to be air-gapped and otherwise highly secured – which would only serve to slow its progress and make it more expensive to operate.

Email investigations

New topic!

It turns out that despite being an aged method of communication, email is still quite relevant to individuals and society, It is one of the original open protocols of the Internet, and many Old People like me still use it as our primary form of professional communication.

Given how email has been in the news over the last couple years (Wikileaks, Clinton, Weiner, etc.) I though we could spend a little time talking about what you can and can’t learn from emails.

Email on the Internet

Email is actually a surprisingly complicated set of protocols. We’re going to talk about the high level stuff today then look at some examples of emails to show what you can and can’t learn from them.

There are three major components to consider: user agents (which are the client-side part of email and analogous to web browsers, though not quite), mail servers, and the SMTP protocol.

(on board)

Mail servers are at the core of email infrastructure. Every recipient has a mailbox located at a mail server. To send a message, a sender creates a message in their user agent. The user agent relays it to the sender’s mail server; the mail server relays it to the recipient’s mail server, where it is deposited in the recipient’s mailbox. The recipient eventually accesses their mailbox with their own user agent and retrieves and/or reads the message.

SMTP (Simple Mail Transport Protocol) is the application-layer protocol spoken among mail servers. It runs atop TCP, and these days, is often wrapped in TLS. Sometimes but not always your user agent also speaks SMTP to send (though not receive) messages from your mail server. SMTP is old (though not as old as me), dating back to 1982, which entails certain legacy issues (for example, only 7-bit ASCII characters are allowed in messages; others must be encoded in various ways, MIME anyone?)

Sending messages

If you want to send a message, your user agent constructs a file in the proper format. Minimally, it consists of a set of headers and an optional message body, for example:

Subject: Re: Switching my major into CS
From: Marc Liberatore <liberato@cs.umass.edu>
Date: Mon, 3 Apr 2017 19:24:02 -0400
To: A Student <a.student@umass.edu>

RFC 5322 specifies (part of) the format of emails; RFC 2821 specifies (part of) SMTP.

Your user agent contacts your mail server, authenticates (if required: in the olden days it was not(!!!)), and accepts the message into a queue. It may add header lines to the email, though generally it will not modify existing header lines.

Your mail server will then contact the destination mail server, which if all goes well will accept the message, again possibly adding its own new header lines before placing it in the user’s mailbox. Notably, certain headers might be purged at this step, like the BCC: header.

(Why does a destination mail server accept from any mailer on the Internet? They actually usually don’t, and do filtering either based on white/blacklists, or DNS records, or throttle, and so on.)

Finally, the recipient will (eventually) open their mailbox using their user agent, and view or download the message.

User agents generally fall into one of two categories these days. Either they’re a traditional mail client (like mutt, Apple Mail, or Mozilla Thunderbird) that uses SMTP (to send) + IMAP (to read/manipulate mailbox) or they are a web application where the communication with the mail server happens on the backend (the web server talks to the mail server, over either SMTP + IMAP or custom protocols).

Eventually, your email ends up on Hillary Clinton’s private server and is the subject of much media hullaballo (just kidding).

What can we learn from emails?

Like any electronic evidence, we need to consider the chain of custody and the source. A single email devoid of context doesn’t tell us much, as we have no guarantee of its provenance. An email stored on a server with a history of logs and the like is more reliable, though again, consider the source. Email is a simple, text-based format with no built-in integrity checks or cryptographic signatures. Forging them is trivial, and just because an incriminating email is found among legitimate ones does not mean it is legitimate. For example, I’d trust, say, anonymously “found” email troves uploaded to Wikileaks far less than emails on a server I administered and knew to be relatively secure and auditable.

That said, what can we learn?

Most of what we can learn is derived from the headers of emails. At all times you should be aware of the provenance of the emails, and have a sense of which mail servers and user agents (and user) hands that an email passed through. That said, what can you learn if you take things at face value?

Some fields are well-defined and under user control. For example, the To: and CC: fields are usually pretty reliable. From: needn’t be true, though most legitimate mail servers will not allow users to spoof the From: address. Date: usually corresponds to when the message was generated. Other fields are server- or mail-agent-specific.

Let’s look at a few examples from my own emails.

(on projector)

Now let’s look at a few more “interesting” emails:

(Note that if you are concerned about sketchy websites you might not want to click on the following)

https://wikileaks.org/dnc-emails/

Exam review

Topics:

NITS
Partitions
FAT layout
FAT allocation / deletion
NTFS (high level only, big guns come on final)
CALEA
Email

(Everything since NITs is on the table. I expect fairly deep knowledge of FAT since you’ve had to parse it extensively. I expect surface knowledge of NTFS since we spent three lectures on it, but the assignment is not yet due.)

What does Tor hide from investigators? How do NITs work (four parts)? Implicated legal doctrine?

What does an MBR look like on disk? What about the disk itself, with, say, four partitions?

Same for GPT – show all parts!

FAT16: What are the parts? Describe what they’re used for.

What is the layout of a FAT16 partition?

Given a FAT and dirents, what happens when a file is deleted?

Given a FAT and dirents, what happesn when a file is created (assume allocation is first-available, not best-fit).

NTFS: What are the important details? What’s the most important file(s)? What’s the basic unit of allocation? What do MFT entries look like? What’s a runlist, and what does it describe?

What is CALEA and what does it require? Why does it not solve the government-must-be-able-to-wiretap problem anymore? What solution to Bellovin et al. propose?