08: fsstat FAT16

Estimated time to complete: Three to four hours

The FAT16 filesystem that we've been covering in lecture is one of the simpler filesystems in wide use. Parsing its general details is a good warmup for full examination and recovery of files from this filesystem, which we'll be doing soon.

The fsstat program, part of The Sleuth Kit, does the sort of parsing we're interested in, taking a FAT16 filesystem as input and returning output of the form:

FILE SYSTEM INFORMATION
--------------------------------------------
File System Type: FAT16

OEM Name: MSDOS5.0
Volume ID: 0x68729702
Volume Label (Boot Sector): MARC'S USB 
Volume Label (Root Directory):
File System Type Label: FAT16   

Sectors before file system: 32

File System Layout (in sectors)
Total Range: 0 - 3913695
* Reserved: 0 - 0
** Boot Sector: 0
* FAT 0: 1 - 239
* FAT 1: 240 - 478
* Data Area: 479 - 3913695
** Root Directory: 479 - 510
** Cluster Area: 511 - 3913662
** Non-clustered: 3913663 - 3913695

METADATA INFORMATION
--------------------------------------------
Range: 2 - 62611478
Root Directory: 2

CONTENT INFORMATION
--------------------------------------------
Sector Size: 512
Cluster Size: 32768
Total Cluster Range: 2 - 61144

FAT CONTENTS (in sectors)
--------------------------------------------
511-574 (64) -> EOF
575-638 (64) -> EOF
639-702 (64) -> EOF
...etc...

You're going to recover most of this same information directly in a Python program.

What to do

Implement an fsstat_fat16() function.

fsstat_fat16 takes one mandatory argument, an open file-like object containing a valid FAT16 filesystem. The next argument is optional, and denotes the sector size of the filesystem (default: 512). The last argument is optional, and denotes the offset (in sectors) from the start of the file-like object to the start of the FAT16 filesystem.

For ease of debugging and autograding, fsstat_fat16 will return a list of strings. The list should be composed almost exactly of the lines of output that fsstat would produce when run on the same filesystem (ignoring leading and trailing whitespace) with the following exceptions. Omit the line:

Volume Label (Root Directory):

and the lines:

METADATA INFORMATION
--------------------------------------------
Range: 2 - 62611478
Root Directory: 2

(including the empty line after Root Directory) as they deal with data in the directory entries, which you don't need to handle (yet).

You should be able to generate your output examining only the boot sector and FAT1.

Most of the values in the output of fsstat should be self-explanatory given lecture material and Carrier's text (though please feel free to ask if you don't know what a value is or where it comes from!). The thing that is perhaps least clear is the FAT CONTENTS section, which the man page helpfully(?) notes is "...the FAT table ... in a condensed format."

What is that format? Tighten your shorts, pilgrim, here we go.

First: The FAT has a sequence of entries, 16 bits (2 bytes) per cluster. But fsstat works in sectors, so you have to translate between the two. The cluster area are the sectors where the clusters (which are numbered starting from 2, not 0 or 1) are on disk. So in the image above, the first cluster starts at sector 511.

Now, let's look at the start of the FAT on this disk (the offsets are from the start of the FAT):

00000000  f8 ff ff 7f ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000020  ff ff ff ff ff ff ff ff  ff ff 00 00 00 00 ff ff  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff 00 00 ff ff ff ff  |................|
00000040  00 00 00 00 00 00 ff ff  ff ff ff ff ff ff 28 00  |..............(.|
00000050  29 00 2a 00 ff ff 2c 00  2d 00 2e 00 ff ff 30 00  |).*...,.-.....0.|

The first four bytes are the entries for the (nonexistent) clusters 0 and 1 -- this data is used for other purposes, as noted in Carrier. The next two bytes (at 0x4) are ff ff, indicating that this cluster is the last cluster in whichever file it is part of. This is reflected in the following line of the FAT CONTENTS:

511-574 (64) -> EOF

The file located cluster 2 occupies 32KB (one cluster); it starts at sector 511 (the start of the cluster area, since it's cluster 2) and runs through sector 574 (inclusive), for a total of 64 sectors; there is no more to this file.

Further down, let's look at the entry for cluster 39, located at 0x4e. The value there is 28 00, indicating the next cluster for this file is cluster 40 (28 00 is the little-endian hexadecimal encoding for 40) -- that's the next entry in the table, at 0x50. Cluster 40's entry is 29 00, indicating that its next entry is at cluster 41 (at 0x52); cluster 41's entry is 2a 00, indicating that its next entry is at cluster 42 (0x54); cluster 42's entry is ff ff, indicating that it's the last cluster in the file. And if we look in the list of entries from fsstat to the entry for 2879 (511 + (37 * 64), the offset in sectors to the start of the cluster area (511) plus the distance in sectors to the 37th cluster (times 64 sectors per cluster)) we see:

2879-3134 (256) -> EOF

that the entry here is exactly four clusters (256 sectors) long, just as we expect.

The final case is when an entry in the FAT refers to a file that's not entirely contiguous and in-order on disk (also known as "fragmented"). In this case, the value to the right of the arrow refers to the first cluster of the next fragment of the file. For example, further down in the fsstat output for this image are the following entries:

6719-6846 (128) -> 6975
...more...
6975-7038 (64) -> EOF

This indicates a run of clusters (at sectors 6719-6846) that continues at a non-contiguous point (at the cluster corresponding to sector 6975) rather than ending here.

Let's look at the corresponding areas of the FAT. Sector 6719 is cluster 99 (2 + (6719 - 511) / 64), because sector 511 is the first cluster, which is number 2; 32KB clusters are 64 512B sectors long. Cluster 99 is at offset 0xc6 from the start of the FAT:

000000c0  ff ff 00 00 ff ff 64 00  67 00 66 00 ff ff ff ff  |......d.g.f.....|

and contains the value 64 00 (which equals 100 in decimal), so the next cluster follows. The entry contains the value 67 00 (which equals 103), which does not immediately follow. So clusters 99 and 100 (which are located on sectors 6719-6846 and span 128 sectors) compose part of a file, which continues on sector 6975 (which is cluster 103). And if we look at cluster 103's entry, at 0xce, we see it's ff ff, the last sector in the file.

What to submit

Submit a single Python file named fsstat_fat16.py. This file must define the function above. It should look like the following, but with an actual implementation rather than a placeholder.

import struct


def fsstat_fat16(fat16_file, sector_size=512, offset=0):
    result = ['FILE SYSTEM INFORMATION',
              '--------------------------------------------',
              'File System Type: FAT16',
              '']

    # then do a few things, .append()ing to result as needed

    return result

(Spoiler alert!) You're almost certainly going to want to reuse some of your code in the next assignment when you reimplement parts of fls and icat. Keep this in mind when writing your code now.

Tests

(Update 2017-03-22: spiff.zip contains a single image spiff.dd. This image, unlike the others, does not start with the volume in sector 0; instead there is a MBR there, then a blank sector, then the start of the FAT partition. So you'll need to invoke fsstat_fat16 with an offset = 2. You can invoke fsstat -o 2 spiff.dd to analyze it with TSK. This test case was going to be a Gradescope-only test. You can thank the fine students who came to office hours and convinced me to make it public.)

(Update 2017-03-17: fragmented.zip contains a single image fragmented.dd, which contains a file spread across non-contiguous clusters. The sector size of this image is 1,024 bytes.)

I've put together a test case based upon the adams.dd image we've been using in class. The Gradescope tests, which will be up in a bit, will test your code a little more thoroughly. But at this point I expect you to be able to develop your own test cases -- worst case, you can acquire a smallish USB key, format it as FAT16, then borrow a Mac or Linux machine to dd it and use TSK's fsstat to analyze it.

The code and data are here: fsstat_fat16-student-tests.zip

In particular, if you capture the output of fsstat to a file, for example:

fsstat adams.dd > adams.dd.fsstat.out

The resulting file can be the "expected" output of a test case, as in test_fsstat_fat16.py:

class TestFSStatFat16(unittest.TestCase):
    def testAdams(self):
        with open('adams.dd.fsstat.out') as f:
            expected = tsk_helper.strip_all(tsk_helper.get_fsstat_output(f))
        with open('adams.dd', 'rb') as f:
            actual = tsk_helper.strip_all(fsstat_fat16.fsstat_fat16(f))
        self.assertEqual(expected, actual)