COMPSCI 590K: Advanced Digital Forenics Systems | Spring 2020

Assignment 01: Intro and Carving

This assignment is due by 9pm on Tuesday, February 11th. It must be submitted through Moodle. (Be sure to click through for the whole thing!)

  1. (10 points) Suppose you were tasked with acquiring an image of a large hard disk drive. Specifically, you are imaging the 10 TB Western Digital Red NAS Hard Drive. How long would it take to linearly read every bit of this drive (hours and minutes)? You will probably want to consult the actual specification sheet, rather than use marketing copy, for values in your calculation. Other things to watch out for: bits vs bytes, and what’s an actual TB (see Footnote 2 in the English-language tech-specs: 1000^4 != 1024^4).

  2. (15 points) When recovering files from disk images, an investigator may choose to parse filesystem metadata, or to linearly scan an entire disk and attempt to carve files of interest. Why might an investigator might choose one approach or the other depending upon the circumstances; your answer should explain the pros and cons of each, giving at least three distinct reasons.

  3. (15 points) Suppose you are attempting to carve contiguously allocated files of interest from a 1 TB disk image, by exhaustively considering all possible byte strings, and feeding them into a validator.

    • How many possible strings are there?
    • Suppose files of interest always begin at sector boundaries. If the sector size is 4,096 bytes, how many possible strings must be considered?
    • Additionally, suppose that the file format being considered tolerates trailing data. That is, the files can be opened and the validator will accept them even if there is an arbitrary data appended. How many possible strings must be considered?
  4. (40 points) One of the image files that was examined in (Garfinkel, 2007) is located at http://old.dfrws.org/2006/challenge/dfrws-2006-challenge.zip. Write a program that does one of two things to this (or any similar) file:

    • (easier) Carves out all valid, contiguous JPEGs and strips trailing junk data correctly. Be careful – you cannot necessarily just match FF D8 and FF D9 pairs, as JPEGs can contain embedded JPEGs.
    • (harder) Carves out the above, as well as some or all fragmented JPEGs. In addition to looking for the byte patterns known to start and end JPEGs, you might also want to use some sort of JPEG validator – it’s fine to use a third party tool (for example, an image display or manipulation library) to validate a possible JPEG as valid.

To be clear, your entire submission should be comprised of your written answers to each question, along with your code. Putting it all into a reasonable archive format (.zip, .tar.gz) and uploading it through Moodle is how we expect you to get it to us.

Reminder: Group work is permitted (so long as you clearly indicate group members). But if you work in groups, we will generally expect a higher level of performance on the work, particularly on the more difficult questions, such as (4.) in this assignment)

Menu