08: fsstat FAT16
Estimated time to complete: Three to four hours
The FAT16 filesystem that we’ve been covering in lecture is one of the simpler filesystems in wide use. Parsing its general details is a good warmup for full examination and recovery of files from this filesystem; you’ll do (most of) that in the next assignment.
The fsstat
program, part of The Sleuth Kit, does the sort of parsing we’re interested in, taking a FAT16 filesystem as input and returning output of the form:
FILE SYSTEM INFORMATION
--------------------------------------------
File System Type: FAT16
OEM Name: MSDOS5.0
Volume ID: 0x68729702
Volume Label (Boot Sector): MARC'S USB
Volume Label (Root Directory):
File System Type Label: FAT16
Sectors before file system: 32
File System Layout (in sectors)
Total Range: 0 - 3913695
* Reserved: 0 - 0
** Boot Sector: 0
* FAT 0: 1 - 239
* FAT 1: 240 - 478
* Data Area: 479 - 3913695
** Root Directory: 479 - 510
** Cluster Area: 511 - 3913662
** Non-clustered: 3913663 - 3913695
METADATA INFORMATION
--------------------------------------------
Range: 2 - 62611478
Root Directory: 2
CONTENT INFORMATION
--------------------------------------------
Sector Size: 512
Cluster Size: 32768
Total Cluster Range: 2 - 61144
FAT CONTENTS (in sectors)
--------------------------------------------
511-574 (64) -> EOF
575-638 (64) -> EOF
639-702 (64) -> EOF
...etc...
You’re going to recover most of this same information directly in a Python program.
What to do
Implement an fsstat_fat16()
function.
fsstat_fat16
takes one mandatory argument, an open file-like object containing a valid FAT16 filesystem. The next argument is optional, and denotes the sector size of the filesystem (default: 512). The last argument is optional, and denotes the offset (in sectors) from the start of the file-like object to the start of the FAT16 filesystem.
For ease of debugging and autograding, fsstat_fat16
will return a list of strings. The list should be composed almost exactly of the lines of output that fsstat
would produce when run on the same filesystem (ignoring leading and trailing whitespace) with the following exceptions. Omit the line:
Volume Label (Root Directory):
and the lines:
METADATA INFORMATION
--------------------------------------------
Range: 2 - 62611478
Root Directory: 2
(including the empty line after Root Directory
) as they deal with data in the directory entries, which you don’t need to handle for this assignment – you’ll deal with them in the next one, but I want to keep this stuff separate for now. Why? So you can generate your output examining only the boot sector and the first file allocation table.
Most of the values in the output of fsstat
should be self-explanatory given lecture material and Carrier’s text (though please feel free to ask if you don’t know what a value is or where it comes from!). The thing that is perhaps least clear is the FAT CONTENTS
section, which the man
page helpfully(?) notes is “…the FAT table … in a condensed format.”
What is that format? We talked about it in lecture (and you can look at the notes for the same discussion for adams.dd
), but here’s a refresher on another.
First: The FAT has a sequence of entries, 16 bits (2 bytes) per cluster. But fsstat
works in sectors, so you have to translate between the two. The cluster area are the sectors where the clusters (which are numbered starting from 2, not 0 or 1) are on disk. So in the image above, the first cluster starts at sector 511.
Now, let’s look at the start of the FAT on this disk (the offsets are from the start of the FAT):
00000000 f8 ff ff 7f ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff 00 00 00 00 ff ff |................|
00000030 ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff |................|
00000040 00 00 00 00 00 00 ff ff ff ff ff ff ff ff 28 00 |..............(.|
00000050 29 00 2a 00 ff ff 2c 00 2d 00 2e 00 ff ff 30 00 |).*...,.-.....0.|
The first four bytes are the entries for the (nonexistent) clusters 0 and 1 – this data is used for other purposes, as noted in Carrier. The next two bytes (at offset 0x4) are ff ff
, indicating that this cluster is the last cluster in whichever file it is part of. This is reflected in the following line of the FAT CONTENTS
:
511-574 (64) -> EOF
The file located cluster 2 occupies 32KB (one cluster); it starts at sector 511 (the start of the cluster area, since it’s cluster 2) and runs through sector 574 (inclusive), for a total of 64 sectors; there is no more to this file.
Further down, let’s look at the entry for cluster 39, located at offset 0x4e. The value there is 28 00
, indicating the next cluster for this file is cluster 40 (28 00
is the little-endian hexadecimal encoding for 40) – that’s the next entry in the table, at 0x50. Cluster 40’s entry is 29 00
, indicating that its next entry is at cluster 41 (at 0x52); cluster 41’s entry is 2a 00
, indicating that its next entry is at cluster 42 (0x54); cluster 42’s entry is ff ff
, indicating that it’s the last cluster in the file. And if we look in the list of entries from fsstat
to the entry for 2879 (511 + (37 * 64), the offset in sectors to the start of the cluster area (511) plus the distance in sectors to the 37th cluster (times 64 sectors per cluster)) we see:
2879-3134 (256) -> EOF
that the entry here is exactly four clusters (256 sectors) long, just as we expect.
The final case is when an entry in the FAT refers to a file that’s not entirely contiguous and in-order on disk (also known as “fragmented”). In this case, the value to the right of the arrow refers to the first cluster of the next fragment of the file. For example, further down in the fsstat
output for this image are the following entries:
6719-6846 (128) -> 6975
...more...
6975-7038 (64) -> EOF
This indicates a run of clusters (at sectors 6719-6846) that continues at a non-contiguous point (at the cluster corresponding to sector 6975) rather than ending here.
Let’s look at the corresponding areas of the FAT. Sector 6719 is cluster 99 (2 + (6719 - 511) / 64), because sector 511 is the first cluster, which is number 2; 32KB clusters are 64 512B sectors long. Cluster 99 is at offset 0xc6 from the start of the FAT:
000000c0 ff ff 00 00 ff ff 64 00 67 00 66 00 ff ff ff ff |......d.g.f.....|
and contains the value 64 00
(which equals 100 in decimal), so the next cluster follows immediately. The entry contains the value 67 00
(which equals 103), which does not immediately follow. So clusters 99 and 100 (which are located on sectors 6719-6846 and span 128 sectors) compose part of a file, which continues on sector 6975 (which is cluster 103). And if we look at cluster 103’s entry, at 0xce, we see it’s ff ff
, the last sector in the file.
What to submit
Submit a single Python file named fsstat_fat16.py
. This file must define the function above. It should look like the following, but with an actual implementation rather than a placeholder. (Note: If Gradescope won’t process your program, make sure it doesn’t produce excessive output (for example, via print
); also make sure it works locally using the unit test file we provide. If you’re still having trouble, let us know on Piazza.)
import struct
def fsstat_fat16(fat16_file, sector_size=512, offset=0):
result = ['FILE SYSTEM INFORMATION',
'--------------------------------------------',
'File System Type: FAT16',
'']
# then do a few things, .append()ing to result as needed
return result
(Spoiler alert!) You’re almost certainly going to want to reuse some of your code in the next assignment when you re-implement parts of fls
and icat
. Keep this in mind when writing your code now.
Tests
I’ve put together a test case based upon the adams.dd
image we’ve been using in class. The Gradescope tests will test your code a little more thoroughly. But at this point I expect you to be able to develop your own test cases – worst case, you can set up a virtual Linux machine to dd
, mkfs.fat
like we did in class and then use TSK’s fsstat
to analyze it. Or ask a friend to help do so. Or do so and then post to it to Piazza for everyone (if you’re feeling like a mensch).
The code and data are here: fsstat_fat16-student-tests.zip
In particular, if you capture the output of fsstat
to a file, for example:
fsstat adams.dd > adams.dd.fsstat.out
The resulting file can be the “expected” output of a test case, as in test_fsstat_fat16.py
:
class TestFSStatFat16(unittest.TestCase):
def testAdams(self):
with open('adams.dd.fsstat.out') as f:
expected = tsk_helper.strip_all(tsk_helper.get_fsstat_output(f))
with open('adams.dd', 'rb') as f:
actual = tsk_helper.strip_all(fsstat_fat16.fsstat_fat16(f))
self.assertEqual(expected, actual)
The autograder references two other images. I’ve included them below (but you’ll need to add them to the test scaffolding yourself). Do not hardcode your answers to this assignment, particularly the FAT CONTENTS. We will be checking for this behavior and will treat it as deliberate academic dishonesty if you do so.
fragmented.zip contains a single image fragmented.dd
, which contains a file spread across non-contiguous clusters. The sector size of this image is 1,024 bytes.
spiff.zip contains a single image spiff.dd
. This image, unlike the others, does not start with the volume in sector 0; instead there is a MBR there, then a blank sector, then the start of the FAT partition. So you’ll need to invoke fsstat_fat16
with an offset = 2
. You can invoke fsstat -o 2 spiff.dd
to analyze it with TSK.