12: Demonstration: Parsing FAT

Announcements

A08 / A09 posted.

Midterm exam grading is still underway.

Recovering a deleted file

Earlier, we found an entry corresponding to a deleted file:

dd if=adams.dd bs=512 skip=75 count=2| hexdump -C
2+0 records in
2+0 records out
1024 bytes transferred in 0.000023 secs (44278013 bytes/sec)
00000000  2e 20 20 20 20 20 20 20  20 20 20 10 00 00 4e 5c  |.          ...N\|
00000010  a1 38 a1 38 00 00 4e 5c  a1 38 03 00 00 00 00 00  |.8.8..N\.8......|
00000020  2e 2e 20 20 20 20 20 20  20 20 20 10 00 00 4e 5c  |..         ...N\|
00000030  a1 38 a1 38 00 00 4e 5c  a1 38 00 00 00 00 00 00  |.8.8..N\.8......|
00000040  e5 4d 47 5f 33 30 32 37  4a 50 47 20 00 00 c4 79  |.MG_3027JPG ...y|
00000050  e1 38 e1 38 00 00 c4 79  e1 38 04 00 8c a0 1c 00  |.8.8...y.8......|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400

The file was named “?MG_3027.JPG” – notice that we lose the first character of a filename if it’s deleted. It used to be stored at cluster 4.

Sleuthkit also sees this file (-r shows everything, recursively):

fls -r adams.dd
r/r 3:  ADAMS       (Volume Label Entry)
d/d 5:  images
+ r/r * 549:    _MG_3027.JPG
r/r 7:  Designs.doc
v/v 163171: $MBR
v/v 163172: $FAT1
v/v 163173: $FAT2
d/d 163174: $OrphanFiles

or if we want to go entry-by-entry:

fls adams.dd
r/r 3:  ADAMS       (Volume Label Entry)
d/d 5:  images
r/r 7:  Designs.doc
v/v 163171: $MBR
v/v 163172: $FAT1
v/v 163173: $FAT2
d/d 163174: $OrphanFiles

fls adams.dd 5
r/r * 549:  _MG_3027.JPG

Where does this metadata address of 549 come from? It’s clearly not a cluster number or whatnot, right?

Many filesystems have a concept of “inodes”, which are unique metadata addresses that files and directories share. Not FAT. So instead TSK generates unique metadata addresses for FAT. The root directory entry is given the value 2. Each sector of the disk, starting at the beginning of the data area, could hypothetically contain 16 entries, so we number them starting from 3. This means that, say, the 512 entries in the root directory are numbered 3–514. Then there’s gonna be gaps, since most sectors don’t actually hold directory entries.

Recall that our cluster area started at sector 73, and the directory entries we extracted for this “IMAGES” directory were at cluster 75. If sector 73 and 74 were full of directory entries, there’d be 32 (16 per sector) in each of them. And our deleted file is the 3rd entry in the next sector.

514 + 32 + 3 = 549, the metadata address. Boom.

OK, how long was this file before it was deleted? Looking at the last four bytes of the directory entry 8c a0 1c 00 show it was 1,876,108 bytes long, which would have required 1833 1KB clusters to store.

Interestingly, that’s exactly how many clusters are currently marked as unallocated between its old starting cluster (4) and the next cluster allocated on the disk (1837). I wonder if those bytes look like a JPEG? Remember, cluster 4 starts two clusters past the start of the cluster area, which is sector 73 + 4 = 77.

dd if=adams.dd of=IMG_3027.JPG bs=512 skip=77 count=3666
hexdump -Cv IMG_3027.JPG|less

Those headers look familiar to you at all?

This is (almost) what icat (remember that?) from the second lecture does. icat is a little smarter. For example, it will truncate the file to the file size listed in the directory entry.

Building and then parsing a filesystem

(This will be helpful to you when doing the next assignment, which I’ll be putting up shortly.)

This is all being done on an Ubuntu virtual machine, using Vagrant to manage it. We are just creating the filesystem; no MBR.

# create a new empty file
dd if=/dev/zero of=fat.dd bs=1M count=10

# view it
hexdump -C fat.dd

# create a FAT filesystem
mkfs.fat fat.dd 

# view it
hexdump -C fat.dd

# view it in sleuthkit
fsstat fat.dd 
fls fat.dd

Parsing it

Can we get some essentials out of this ourselves? In particular, the cluster size, the first FAT, root directory area, and cluster area? (Code at end of notes; compare with fsstat output.)

Two asides

First, you can work on sequences of bytes or directly on a file-like object:

with open('fat.dd', 'rb') as f:
    data = f.read()

x = data[i:j] 

# is equivalent to

f.seek(i)
x = f.read(j-i + 1)

The former is maybe ergonomically easier but does require that you load the entire file into memory, which is not always feasible.

Second, indexing into a sequence is different from slicing an sequence:

bytes_sequence[i]  # returns the i-th element of sequence
bytes_sequence[i:i+1]  # returns a *sequence* consisting of the i-th element of bytes_sequence

This distinction is particularly important when passing arguments to struct.unpack, as it expects a bytes sequence as its second argument, not a single value.

On with the show

# mount it
mkdir mnt
sudo mount -o sync fat.dd mnt/

# view it
fls fat.dd
hexdump -C fat.dd

# add a file
nano hello.txt
sudo cp hello.txt mnt/

# view it
fls fat.dd
hexdump -C fat.dd

Can we parse this directory entry? (Code at end of notes; compare with fsstat and fls output.)

# make a 2-cluster file
dd if=/dev/urandom of=random.dat bs=2048 count=2
sudo cp random.dat mnt/

Can we parse this directory entry? (Code at end of notes; compare with fsstat and fls output.)

Code from class follows:

import struct


def as_le_unsigned(b):
    table = {1: 'B', 2: 'H', 4: 'L', 8: 'Q'}
    return struct.unpack('<' + table[len(b)], b)[0]


def get_sector_size(fs_bytes):
    return as_le_unsigned(fs_bytes[11:13])


def get_cluster_size(fs_bytes):
    return as_le_unsigned(fs_bytes[13:14]) * get_sector_size(fs_bytes)


def get_reserved_area_size(fs_bytes):
    return as_le_unsigned(fs_bytes[14:16]) * get_sector_size(fs_bytes)


def get_fat_size(fs_bytes):
    return as_le_unsigned(fs_bytes[22:24]) * get_sector_size(fs_bytes)


def get_fat0(fs_bytes):
    start = get_reserved_area_size(fs_bytes)
    length = get_fat_size(fs_bytes)
    return fs_bytes[start:start + length]


def get_number_of_fats(fs_bytes):
    return as_le_unsigned(fs_bytes[16:17])


def get_max_root_directory_entries(fs_bytes):
    return as_le_unsigned(fs_bytes[17:19])


def get_root_directory_area(fs_bytes):
    start = get_reserved_area_size(fs_bytes) + get_number_of_fats(fs_bytes) * get_fat_size(fs_bytes)
    length = get_max_root_directory_entries(fs_bytes) * 32  # 32 bytes / entry
    return fs_bytes[start:start + length]


def get_sector_count(fs_bytes):
    return max(as_le_unsigned(fs_bytes[19:21]), as_le_unsigned(fs_bytes[32:36]))


def get_cluster_area(fs_bytes):
    fs_size = get_sector_count(fs_bytes) * get_sector_size(fs_bytes)

    start = get_reserved_area_size(fs_bytes) + get_number_of_fats(fs_bytes) * get_fat_size(fs_bytes) \
            + get_max_root_directory_entries(fs_bytes) * 32

    number_of_clusters = (fs_size - start) // get_cluster_size(fs_bytes)
    length = number_of_clusters * get_cluster_size(fs_bytes)

    return fs_bytes[start:start + length]


def get_filename(dirent):
    return dirent[0:8].decode('ascii').strip() + '.' + dirent[8:11].decode('ascii')


def get_first_cluster(dirent):
    return as_le_unsigned(dirent[26:28])


def get_filesize(dirent):
    return as_le_unsigned(dirent[28:32])


def get_cluster_numbers(first_cluster, fat_bytes, cluster_size):
    result = [first_cluster]
    offset = 2 * first_cluster
    next_cluster = as_le_unsigned(fat_bytes[offset:offset + 2])
    while next_cluster < as_le_unsigned(b'\xf8\xff'):
        result.append(next_cluster)
        offset = 2 * next_cluster
        next_cluster = as_le_unsigned(fat_bytes[offset:offset + 2])
    return result

def main():
    with open('fat.dd', 'rb') as f:
        data = f.read()
    print('sector size:', get_sector_size(data))
    print('cluster size:', get_cluster_size(data))
    print('reserved area size:', get_reserved_area_size(data))
    print('FAT size:', get_fat_size(data))
    print('number of FATs:', get_number_of_fats(data))
    print('max root entries:', get_max_root_directory_entries(data))
    print('sector count:', get_sector_count(data))

    root_directory_entries = get_root_directory_area(data)

    dirent = root_directory_entries[32 * 3: 32 * 4]
    print('filename:', get_filename(dirent))
    print('first cluster:', get_first_cluster(dirent))
    print('file size:', get_filesize(dirent))
    print('cluster numbers:', get_cluster_numbers(get_first_cluster(dirent), get_fat0(data), get_cluster_size(data)))


if __name__ == '__main__':
    main()