13: BitTorrent Redux

Last class we did a whiteboard presentation of how you might investigate Bittorrent; today we’ll do a little more practical demonstration with Wireshark and Python.

.torrent files

First, let’s look at the structure of a .torrent file.

Everything in a torrent file is more-or-less encoded in a format called “bencoding”. There are many libraries to bencode/bdecode; I have the “official” one installed for Python 3.

They encode strings, ints, lists, and dicts; this suffices for both .torrent files and some of the on-the-wire bittorrent protocol.

A .torrent file (in the bittorrent spec, a “metainfo” file, but we’ll just call it a torrent file) is just a bencoded dictionary.

There are only two required keys: announce, whose value is the URL of the tracker, and info which describes the content that this torrent describes (as a dictionary). There are some optional keys as well (see spec).

info dictionary

In the simplest case, the torrent describes a single file. Then the info dictionary has four mandatory key/value pairs

name: the filename
length: length of the file in bytes (integer)
piece length: number of bytes in each piece
pieces: string consisting of the concatenation of all 20-byte SHA1 hash values, one per piece

If the torrent instead describes multiple files, then instead of the name/length keys, there is a key called files, which has as a value a list of dictionaries. Each dictionary in this list has two keys (path and length, describing the path and length in bytes of the files). The piece hashes are the hashes of the concatenation of all of these files in this order.

Some examples:

import bencode
d = open('/Users/liberato/Downloads/ubuntu-18.10-desktop-amd64.iso.torrent', 'rb').read()
bencode.bdecode(d)
bencode.bdecode(d).keys()
bencode.bdecode(d)['info']
info = bencode.bdecode(d)['info']
info.keys()
info['name']
info['length']
info['piece length']
pieces = info['pieces']
pieces[0:20]
import binascii
binascii.hexlify(pieces[0:20])

So, assuming you can get a bdecoder working, it’s pretty easy to examine torrent files.

Let’s verify that

Now, recall how we handle multi-file torrents. How might we find “pieces of interest” given a large library of know files of interest, and a torrent under consideration? (rehash from last lecture).

On the wire

(Note: I’ve disabled a bunch of stuff on purpose here to avoid getting into the protocol weeds; the bittorrent spec has lots of options due to its evolution over the years, and some of them are interesting, but this is not a networking class so I’m going to skip the details).

How does a user get a list of peers? One way is through a tracker. Here’s an example, and how we parse it out of the capture:

(demo on ubuntu torrent; show the HTTP interaction including GET and response, and how to parse the response: deflate, bedecode, examine.)

How do you get a piece? You talk to a peer. There’s a handshake, and then exchange of fixed-length messages.

Moving on to OneSwarm

(didn’t get here, more next class)