12: Demonstration: Parsing FAT


Visual hex editors

Maybe you're not a fan of hexdump.

(Demo of hexfiend for OS X): http://ridiculousfish.com/hexfiend/

Metadata addresses in Sleuthkit

(See previous lecture's notes.)

Building a filesystem

This is all being done on an Ubuntu virtual machine, using Vagrant to manage it. We are just creating the filesystem; no MBR.

# create a new empty file
dd if=/dev/zero of=fat.dd bs=1M count=10

# view it
hexdump -C fat.dd

# create a FAT filesystem
mkfs.fat fat.dd 

# view it
hexdump -C fat.dd

# view it in sleuthkit
fsstat fat.dd 
fls fat.dd

Parsing it

Can we get some essentials out of this ourselves? In particular, the cluster size, the first FAT, root directory area, and cluster area? (Code at end of notes; compare with fsstat output.)

Two asides

First, you can work on sequences of bytes or directly on a file-like object:

with open('fat.dd', 'rb') as f:
    data = f.read()

x = data[i:j] 

# is equivalent to

x = f.read(j-i + 1)

The former is maybe ergonomically easier but does require that you load the entire file into memory, which is not always feasible.

Second, indexing into a sequence is different from slicing an sequence:

bytes_sequence[i]  # returns the i-th element of # is equivalent to
bytes_sequence[i:i+1]  # returns a sequence consisting of the i-th element of bytes_sequence

This distinction is particularly important when passing arguments to struct.unpack, as it expects a bytes sequence as its second argument, not a single value.

On with the show

(Some livecoding followed.)

# mount it ; sync it so changes show up immediately in our disk image
mkdir mnt
sudo mount -o sync fat.dd mnt/

# view it
fls fat.dd
hexdump -C fat.dd

# add a file
nano hello.txt
sudo cp hello.txt mnt/

# view it
fls fat.dd
hexdump -C fat.dd

Can we parse this directory entry? (Code at end of notes; compare with fsstat and fls output. The code below fixes the end-of-class error.)

# make a 2-cluster file
dd if=/dev/urandom of=random.dat bs=2048 count=2
sudo cp random.dat mnt/

Can we parse this directory entry? (Code at end of notes; compare with fsstat and fls output.)

Code from class follows:

import struct

def as_le_unsigned(b):
    table = {1: 'B', 2: 'H', 4: 'L', 8: 'Q'}
    return struct.unpack('<' + table[len(b)], b)[0]

def get_sector_size(fs_bytes):
    return as_le_unsigned(fs_bytes[11:13])

def get_cluster_size(fs_bytes):
    return as_le_unsigned(fs_bytes[13:14]) * get_sector_size(fs_bytes)

def get_reserved_area_size(fs_bytes):
    return as_le_unsigned(fs_bytes[14:16]) * get_sector_size(fs_bytes)

def get_fat_size(fs_bytes):
    return as_le_unsigned(fs_bytes[22:24]) * get_sector_size(fs_bytes)

def get_fat0(fs_bytes):
    start = get_reserved_area_size(fs_bytes)
    length = get_fat_size(fs_bytes)
    return fs_bytes[start:start + length]

def get_number_of_fats(fs_bytes):
    return as_le_unsigned(fs_bytes[16:17])

def get_max_root_directory_entries(fs_bytes):
    return as_le_unsigned(fs_bytes[17:19])

def get_root_directory_area(fs_bytes):
    start = get_reserved_area_size(fs_bytes) + get_number_of_fats(fs_bytes) * get_fat_size(fs_bytes)
    length = get_max_root_directory_entries(fs_bytes) * 32  # 32 bytes / entry
    return fs_bytes[start:start + length]

def get_sector_count(fs_bytes):
    return max(as_le_unsigned(fs_bytes[19:21]), as_le_unsigned(fs_bytes[32:36]))

def get_cluster_area(fs_bytes):
    fs_size = get_sector_count(fs_bytes) * get_sector_size(fs_bytes)

    start = get_reserved_area_size(fs_bytes) + get_number_of_fats(fs_bytes) * get_fat_size(fs_bytes) \
            + get_max_root_directory_entries(fs_bytes) * 32

    number_of_clusters = (fs_size - start) // get_cluster_size(fs_bytes)
    length = number_of_clusters * get_cluster_size(fs_bytes)

    return fs_bytes[start:start + length]

def get_filename(dirent):
    return dirent[0:8].decode('ascii').strip() + '.' + dirent[8:11].decode('ascii')

def get_first_cluster(dirent):
    return as_le_unsigned(dirent[26:28])

def get_filesize(dirent):
    return as_le_unsigned(dirent[28:32])

def get_cluster_numbers(cluster_number, fat_bytes, cluster_size):
    if cluster_number >= as_le_unsigned(b'\xf8\xff'): # handle edge case first
        return [cluster_number]
    result = []
    while cluster_number < as_le_unsigned(b'\xf8\xff'): # should be < not >=
        offset = cluster_number * 2 # offset is * 2 bytes per cluster, not cluster size in bytes
        cluster_number = as_le_unsigned(fat_bytes[offset:offset + 2])
    return result

def main():
    with open('fat.dd', 'rb') as f:
        data = f.read()
    print('sector size:', get_sector_size(data))
    print('cluster size:', get_cluster_size(data))
    print('reserved area size:', get_reserved_area_size(data))
    print('FAT size:', get_fat_size(data))
    print('number of FATs:', get_number_of_fats(data))
    print('max root entries:', get_max_root_directory_entries(data))
    print('sector count:', get_sector_count(data))

    root_directory_entries = get_root_directory_area(data)

    dirent = root_directory_entries[32 * 3: 32 * 4]
    print('filename:', get_filename(dirent))
    print('first cluster:', get_first_cluster(dirent))
    print('file size:', get_filesize(dirent))
    print('cluster numbers:', get_cluster_numbers(get_first_cluster(dirent), get_fat0(data), get_cluster_size(data)))

if __name__ == '__main__':