01: Intro and Carving

  1. (10 points) Suppose you were tasked with acquiring an image of a large hard disk drive (not an SSD, but a spinning magnetic platter). Select a 10 TB or larger drive (for example, search for “10 tb hard drive” on amazon). How long would it take to linearly read this drive? Cite the drive name and the specific technical details (interface speed, sustained read speed, and any other relevant factors) you used to derive your answer.

  2. (15 points) When recovering files from disk images, an investigator may choose to parse filesystem metadata, or to linearly scan an entire disk and attempt to carve files of interest. Compare and contrast the two procedures along at least three distinct lines. Your answer should make it clear under what circumstances an investigator might choose one or the other approach.

  3. (15 points) Suppose you are attempting to carve contiguously allocated files of interest from a 1 TB disk image, by exhaustively considering all possible byte strings, and feeding them into a validator.

    • How many possible strings are there?
    • Suppose files of interest always begin at sector boundaries. If the sector size is 4,096 bytes, how many possible strings must be considered?
    • Additionally, suppose that the file format being considered tolerates trailing data. That is, the files can be opened and the validator will accept them even if there is an arbitrary data appended. How many possible strings must be considered?
  4. (40 points) One of the image files that was examined in (Garfinkel, 2007) is located http://old.dfrws.org/2006/challenge/dfrws-2006-challenge.zip. Write a program that does one of two things to this (or any similar) file:

    • (easier) Carves out all contiguous JPEGs and strips trailing junk data correctly. Be careful – you cannot necessarily just match FF D8 and FF D9 pairs, as JPEGs can contain embedded JPEGs.
    • (harder) Carves out all JPEGs, both those that are contiguous and those with gaps.

    You should submit your code, and a brief writeup explaining your algorithm and approach, any trouble you ran into (and how you overcame it), and whether you think you were successful in extracting all the JPEGs (and why).

To be clear, your entire submission should be comprised of your written answers to each question, along with your code. Putting it all into a reasonable archive format (.zip, .tar.gz) and uploading it through Moodle is how we expect you to get it to us.

Reminder: Group work is permitted (so long as you clearly indicate group members). But if you work in groups, I will generally expect a higher level of performance on the work, particularly on the more difficult questions, such as (4.) in this assignment)