Today we’re going to do an introduction to Freenet, another p2p filesharing network that, like OneSwarm, was designed with privacy in mind. It makes different tradeoffs than OneSwarm, choosing to prioritize anonymity and censorship resistance over performance. Later, we will talk about forensic techniques to investigate users of Freenet, and I think getting an introduction first will help you follow it better.
Note that some of the below is a simplification of how Freenet works, and that I’m eliding some details. It is correct in broad strokes, but you have to read the source to be 100% certain of all the details of how the current client works.
Freenet is implemented as an adaptive peer-to-peer network of nodes that query one another to store and retrieve data files, which are named by location-independent keys. Each node maintains its own local datastore which it makes available to the network for reading and writing, as well as a dynamic routing table containing addresses of other nodes and the keys that they are thought to hold.
The system can be regarded as a cooperative distributed filesystem incorporating location independence and transparent lazy replication. Freenet enables users to share unused disk space.
The basic model is that requests for keys are passed along from node to node through a chain of proxy requests in which each node makes a local decision about where to send the request next.
Depending on the key requested, routes will vary. The routing algorithms for storing and retrieving data are designed to adaptively adjust routes over time to provide efficient performance while using only local, rather than global, knowledge. Nodes only have knowledge of their immediate upstream and downstream neighbors (and the neighbors of those nodes) in the proxy chain, to maintain privacy.
Each request is given a hops-to-live limit, analogous to IP’s time-to-live, which is decremented at each node to prevent infinite chains. Each request is also assigned a pseudo-unique random identifier, so that nodes can prevent loops by rejecting requests they have seen before. When this happens, the immediately-preceding node simply chooses a different node to forward to. This process continues until the request is either satisfied or exceeds its hops-to-live limit. Then the success or failure result is passed back up the chain to the sending node.
There are several kinds of keys used to name files in Freenet. Two of importance:
A content-hash key is simply derived by directly hashing the contents of the corresponding file. This gives every file a pseudo-unique file key. Files may also encrypted by a randomly-generated encryption key. To allow others to retrieve the file, the user publishes the content hash key itself together with the decryption key. Note that the decryption key is never stored with the file but is only published with the file key (so users don’t plausibly know what’s on their own drive.)
Content-hash keys can also be used for splitting files into multiple parts. For large files, splitting can be desirable because of storage and bandwidth limitations. Splitting even medium-sized files into the standard-sized parts – 32 kilobytes – also has advantages in combating traffic analysis. This is easily accomplished by inserting each part separately under a content-hash key, and creating an indirect file (or multiple levels of indirect files) to point to the individual parts.
In practice, this splitting is done by creating a small “manifest” CHK that lists the CHKs of many parts of a file. This is very much like a torrent and pieces: the file is divided into parts (more on this later), and each is given its own key; the list of keys is the manifest, and the manifest itself is keyed, too. Retrieving the manifest key lets you retrieve the associated “blocks” of the file and reassemble it.
A signed-subspace key (SSK), enables personal namespaces. A user creates a namespace by randomly generating a public/private key pair which will serve to identify her namespace. To insert a file, she chooses a short descriptive text string. The public namespace key and the descriptive string are hashed independently, XOR’ed together, and then hashed again to yield the file key.
As with the keyword-signed key, the private half of the asymmetric key pair is used to sign the file. This signature, generated from a random key pair, is more secure than the signatures used for keyword-signed keys. The file is also encrypted by the descriptive string as before.
To allow others to retrieve the file, the user publishes the descriptive string together with her subspace’s public key. Storing data requires the private key, however, so only the owner of a subspace can add files to it.
The address of an SSK site looks something like this:
Here’s an example of an address of a real SSK site in Freenet. It should be a Freenet stress-testing page:
GB3wuHmtxN2wLc7g4y1ZVydkK6sOT-DuOsUo-eHK35w is the hash of the public key. This part is all that is required to uniquely identify the file (but not decrypt it), so nodes need only store this bit. The actual public key is stored (unencrypted) with the (encrypted) data. c63EzO7uBEN0piUbHPkMcJYW7i7cOvG42CM3YDduXDs is the document decryption key. This is only known to clients and not to the nodes storing the data, so nodes cannot decrypt the data they store without the full address.
(In practice, SSK links have largely been superseded by USK links, which are based on SSKs but try to always retrieve the most up-to-date version of the site. This detail is not necessary to fully understand right now.)
How to find keys?
They are (oddly-formatted) URLs, so you can find them in various forums and search engines. Also, once you have an entry point into Freenet (for example, a key that points to a HTML page stored in Freenet) you can follow the links within that page, etc. There are various tools for building wiki-like sites, bulletin-boards, etc., all hosted within Freent.
Recall each node has an “address.” This is just a point on the interval [0, 1]. Each node knows about its own address, as well as the addresses of the nodes it is directly (TCP) connected to, as well as the addresses of the nodes they are connected to. It has a “view” of the network, where each node it is connected to is responsible for some portion of the unit circle.
Each node has a local data store where it keeps key/value pairs in storage (and occasionally evicts them if full, LRU or otherwise configurable).
So: Each key is a hash value which you can also convert to a point on this interval. When a user wants to retrieve the value corresponding to a key, it looks into its storage. If the key is there, great. If not, it sends a request to the node “responsible for” the part of the address space the key resides in (that is, the node with the address closest to that key’s address). And it awaits a response.
What happens when a node gets a request? It does the same thing, checking its storage, and returning the result if available, or propagating it if not.
When does the propagation stop? When a HTL value, decremented at each hop, counts down to 0. HTL typically starts at 18.
But doesn’t this mean you can tell who’s asking for a file by checking if the HTL is 18? No. Because the true originator randomly decides (p=0.5) whether to start the HTL at 18 or 17. And, whenever a node gets a request with an HTL of 18, it likewise randomly decides whether to decrement the count. So the most you can know is that an HTL of 16 or less definitely (well, mostly-definitely, the max HTL can be changed in the source) didn’t originate the request. 17+, you just don’t know.
When a node received a reply with a value, whether it is the proxy or the originator, it stores the result in its cache. So popular files get more replicated throughout the network, and reduce load (and increase robustness) accordingly.
Insertion is similar to retrieval, but instead of “GETing” the data, you “PUT” it – the puts are routed toward the node most responsible for the address of the key, and nodes along the way insert the key/value pair into their storage.
Notice that since the address keys do not contain the decryption key, node operators do not in general know (and cannot even decrypt) the data they are storing.
Earlier I mentioned that large files are broken up into manifests. One thing you should know is that the blocks contain some redundancy, unlike BitTorrent pieces. That is, typically for a file that would fit into n blocks, 2n + 1 blocks are inserted in total, n blocks and n+1 “check blocks” generated by a forward error correction algorithm. Any of the n blocks (check or regular) suffice to reconstruct the file. This provides some redundancy for if values are evicted from cache. Also, Freenet randomly re-inserts blocks it successfully retrieves to help keep them live. (Some small percentage – I think about 0.5%)
So that’s Freenet. At a glance it appears to be pretty strong against forensics, but as you’ll see next class, there are statistical methods to discriminate the case of a connected node being the downloader, or the proxy for the downloader.