This project involves building a content distribution network (CDN) in three phases. The CDN consists of three types of entities: Web servers that generate content, thrid-party content distribution nodes (edge servers), and clients that access content.
Part I
The goal of this project is to design a managed content distribution service that assists an online retailer in distributing content. An example of such an application is a Web site like CNN or iTunes selling videos to clients through an Akamai-like CDN (references below). This project is basically about implementing 3PC, but we will build a long-winded story to help design under-specified components better and think we are doing something cool and useful.
You are free to use any programming language (C, C++, Java, C# etc.) and any abstractions such as sockets, RPC, RMI, threads, events, and TCP that you might need. The goal of this project is to go through the process of designing a nontrivial system with a high-level specification and implementing modules in a reusable manner. We know the protocols we use are provably correct on paper, but the transition to a "correct" and efficient implementation is not always easy.
Akamai references: white paper, paper, slides
The system has the following three components.
1. A Web server that periodically produces music files with unique names and different sizes. When a new file is generated the server sends the file to each edge server. When a client contacts the server to purchase a file, the server redirects the client to a randomly chosen edge server. Note: "server" means the Web server that is distinct from the edge servers.
2. An edge server accepts files from the server and serves the files to clients that are redirected to it. However, the edge server has to first commit the purchase in coordination with all other edge servers and the server. It does so by obtaining a purchase order number from the server and running an atomic commit protocol.
3. A client always contact the server first and gets redirected to an edge server. The edge server provides the client with a purchase order number and the client waits till the purchase is committed or aborted. If it's aborted, the client retries the purchase. If a client does not hear back from its edge server for a really long time, it complains to the Web server which redirects it to another random edge server. The client furnishes its purchase order number and ascertains whether the purchase got committed or aborted.
4. A naming service that the client contacts to obtain the identity of the server.
Assume that the server and the edge servers are susceptible to failure. Links are susceptible to failure as well. You should not assume message transmission time is bounded. Your system should be able to simulate arbitrary delays, e.g., if you are using threads for the three components, you may add an arbitrary delay on each outgoing message to simulate an imperfect network. We will assume that the server always recovers after not too long a time.
Next, allow the failure of the server to be transparent to clients like in the primary backup model. When the (primary) server dies, one of the edge servers takes over as the primary and announces to everybody that it is the primary. Clients learn the identity of this primary server through the naming service. Note that the primary could be different from coordinators in ongoing atomic commits.
You can use threads for the above entities and test it all on a single or on different machines in the edlab. If you don't have an edlab account, please get one by writing to CSCF. No GUI is needed. Clients may simply read a list of purchases from a file and execute them as above. You are responsible for proper synchronization when using threads.
Evaluation
Have files of different sizes in the system ranging in size from 1kB,....100kB. Allow for varying amounts of delay for each message in the system by choosing from some distribution with a mean that you can set. Allow servers to die and recover according to a configurable crash frequency. Set up a frequency of client requests such that the system is driven close to capacity. Show that the throughput of purchases improves in the primary-backup scenario compared to when there is a fixed primary.
Submission instructions
You should turn in the following.
1. Documented code, a script that your TA can use to start up the whole system and test it, and an accompanying README file.
2. A document describing the design of the system and all the assumptions you made.
3. A graph comparing the throughput of the system in the fixed primary and the backup models with increasing aggregate frequency of client requests.
4. A description of test cases you used to convince yourself that your system works correctly.
Grading
Correctness of program: 40%
Code documentation: 15%,
Design document: 20%
Thoroughness of test cases and performance results: 25%.