Our next unit is going to be on network forensics. We’ll start with a high-level overview of the parts of the networking stack you’ll need to understand to follow the papers we’re going to read. Then we’ll talk about a few sorts of network investigations and forensics.
The networking stack
application / transport / network / link / physical
This is a logical division into sections. Some of the lines are blurry (HTTP3.0 aka QUIC does the job of both application and transport layers).
What happens at each layer?
The application layer is what your local program is doing: it assumes some kind of underlying transport (on the Internet: typically either the reliable, in-order TCP, or the lossy, best-effort UDP) and uses that to talk to another program. The particular way it “talks” is the application-layer protocol. HTTP is perhaps the most well-known app-layer protocol, but there are others (FTP, ssh, NTP, etc.).
(See HTTP spec.)
Now’s a good time to take a look at this “on the wire”. Let’s look at Wireshark.
(View a sample HTTP spec, see if eduroam cooperates with a live capture.)
This kind of observation is what an observer “on the wire” can see. Many protocols now use HTTPS to prevent snooping by a third party (demo if possible). But of course if you are one of the parties, you can give the keys to Wireshark or other tools (show
mitmproxy) to be able to sniff and/or manipulate the traffic.
UDP is very minimal – more or less just a port number to identify the app on the machine in question (which is identified at the network layer by an IP address). There’s a few other fields but we’ll talk about them only if we need to later.
TCP is not minimal, though most of what it does is in the kernel in a state machine on either side of the connection. Really interesting, but not really part of a typical forensics class, especially one without an explicit networking prereq, so we’re gonna skip it.
(Again in Wireshark / OS X)
The main thing you need to know, of course, is that to a first approximation, every interface (that is, each networking interface, typically a wired or wireless ethernet card) connected to the Internet has a unique IP address. This serves to identify the device uniquely on the Internet, so that when you want to send data to a particular machine, you know what to call it.
IPv4 is a 32-bit address space, organized hierarchically. That is, addresses consist of a prefix number of bits, identifying a network, and then the remainder of bits, identifying a particular machine (really: interface) in that network.
Why hierarchical? Again, long story, but basically, to make routing at Internet-scale feasible. Routing tables are prefix-driven.
A few special addresses: /24s cannot end in .255 or .0 (for reasons related to “broadcast addressers”) and there are some other conventations (on /24s a .1 is usually the local router).
(Again in Wireshark / OS X; also
On a particular medium, you need to be able control multiple simultaneous transmitters/receivers (or just coordinate the one, if there is only one). Ethernet has a link-layer address (MAC address) to (generally) uniquely ID each ethernet card and disambiguate them to a router / switch. MAC addresses are not hierarchcally assigned; they are “fixed” (though overridable in software). MAC addresses are not visible past the border of the local network
(Again in Wireshark / OS X; also
Like, c’mon. This is not a EE class. Electrons on wires, radio waves, etc.
(Pull out oscilloscope. Just kidding.)
What can we learn?
So that’s our lightning-fast review of networking. What can we learn?
Depends upon the protocol and the investigation/forensic model.
One such model: Imagine a corporate or gov’t IT model where you are concerned about data being exfiltrated. You could inspect all outgoing packets (what about data volume? what if they’re encrypted?). You could look at all outgoing IP addresses (who makes the white/black list?) You could install monitoring on all machines and prohibit unregistered computers from using your network.
Or maybe you are doing copyright enforcement (or some other kind of enforcement) on BitTorrent. What does the protocol actually leak?
Forensic investigation of Gnutella and BitTorrent
Why Gnutella and Bittorrent? Popular venues at the time.
Known – how can you learn about more? By downloading and checking. You could also model filenames (which are shockingly descriptive).
Decentralized; p2p connections, bootstrapped by a central server (or hardcoded IPs in general), then all IPs as clients ever observed are cached for future use.
users can search by keyword and browse others’ shared libraries. Browses include filenames, sizes, andh hashes. Searches by hash were being phased out as we wrote this paper. (Why? Copyright enforcement!) File downloads are by hash, though.
GUIDs and IPs: either could be changed in various ways, but typically users did not bother.
Pushes: used to circumvent firewalls or NAT.
(we didn’t get to BitTorrent, I’ll cover this next lecture)
No built-in searches; it’s just for file distribution.
Collections of files identified by
.torrents; At a minimum,this information includes file names, sizes, and SHA-1 hashvalues for power-of-two-sizedpiecesof the concatenated fileset, plus URLs for trackers.
.torrent identified by infohash: the SHA-1 hash of fixed fields withinthe torrent that identify the files being distributeddthe filenames, sizes, and piece sizes and hashes.
Peer communicates with tracker that it wants to download this “infohash”, and gets back a (subset) of other peers currently involved with distributing the same torrent. IPs and GUIDs are used, but BitTorrent GUIDs are very transient.
To down/upload a file, the peer then connects to other peers. They exchange a list of pieces each possesses, then request pieces from each other (and send them). Tit-for-tat, so you can only “leach” if the other peer doesn’t have better options – otherwise you get “choked”.
Other details. DHT: a distributed way to avoid the reliance on trackers (used by so-called “magnet links”). Peer exchange: during download, peers can share IPs of other peers that have the same file.
- Identify FOIs, obtain relevant hashes.
- Use p2p system to find candidates (IPs sharing these files).
- Narrow candidates on some bases (location, etc.)
- Attempt to verify possession or distribution, ideally through direct connection.
- Log during all steps, of course. IPs and GUIDs. (Pros/cons of each.)
- Subpoena ISP for physical address / account holder.
- Obtain warrant.
- Execute warrant, search computers, etc.
Bound by law. 4A requirements; fruit of poisonous tree. Leads/hearsay vs evidence. Wiretapping has a higher standard than searches!
Why do we prefer single source downloads? Legal reasons: you can only send someone the whole file if you possess the whole file.