10: istat NTFS

Estimated time to complete: (∞ + 1) hours

In this assignment, probably the hardest one of the semester, you're going to replicate a subset of istat's functionality as applied to files stored on the NTFS file system. Seriously, don't wait until the day it's due to start this one.

(That's the bad news. The good news is that, after this, we're done with low-level parsing. And for what it's worth, I'm going to cut down on / consolidate some of the parsing assignments for next year.)

As was the case with FAT16, istat run on an NTFS filesystem examines a particular inode (in NTFS, a MFT entry number) and returns information about the file or directory represented by that MFT entry:

istat simple.ntfs 64
MFT Entry Header Values:
Entry: 64        Sequence: 1
$LogFile Sequence Number: 0
Allocated File
Links: 1

Flags: Archive
Owner ID: 0
Security ID: 0  ()
Created:    2017-03-20 14:11:20.414798700 (EDT)
File Modified:  2017-03-20 14:11:20.414892300 (EDT)
MFT Modified:   2017-03-20 14:11:20.414892300 (EDT)
Accessed:   2017-03-20 14:11:20.414798700 (EDT)

$FILE_NAME Attribute Values:
Flags: Archive
Name: testing.txt
Parent MFT Entry: 5     Sequence: 5
Allocated Size: 16      Actual Size: 0
Created:    2017-03-20 14:11:20.414798700 (EDT)
File Modified:  2017-03-20 14:11:20.414798700 (EDT)
MFT Modified:   2017-03-20 14:11:20.414798700 (EDT)
Accessed:   2017-03-20 14:11:20.414798700 (EDT)

Type: $STANDARD_INFORMATION (16-0)   Name: N/A   Resident   size: 48
Type: $FILE_NAME (48-3)   Name: N/A   Resident   size: 88
Type: $SECURITY_DESCRIPTOR (80-1)   Name: N/A   Resident   size: 80
Type: $DATA (128-2)   Name: N/A   Resident   size: 9

You're going to recover most of this same information directly in a Python program.

What to do

Implement an istat_ntfs() function.

istat_ntfs takes two mandatory arguments. The first is an open file-like object containing a valid NTFS filesystem. The second denotes the inode (metadata address) to examine. The next argument is optional, and denotes the sector size of the filesystem (default: 512). The last argument is optional, and denotes the offset (in sectors, default: 0) from the start of the file-like object to the start of the NTFS filesystem.

General approach

Parsing NTFS is like parsing FAT16, there's just more to take care of.

First, parse the boot sector to find the MFT (along with any other important parameters you'll need), as described in Table 13.18 in Carrier.

Then, find the entry of relevance in the MFT (remember, the MFT entries are of fixed size); if the MFT is larger than one cluster, you may need to start by parsing enough of the MFT's entry for itself to find the subsequent clusters of the MFT! You can save this step for later though: at least the easy test cases will have an entry in the first, contiguous run of clusters of the MFT.

Finally, parse the entry, starting with the fields described in Table 13.1. Remember to handle the fixup array! Then you'll need to parse the attributes of the entry, as described in tables in Chapter 13. You will need to display the fields that istat does, namely for $STANDARD_INFORMATION, $FILE_NAME, and $DATA, along with the non-resident cluster list (if applicable) for $DATA. If an entry has other attributes, ignore them, including, for example, the $SECURITY_DESCRIPTOR attribute. You will need to parse them enough to skip over them, of course, but that only involves parsing the attribute header.

Other notes

For ease of debugging and autograding, istat_ntfs will return a list of strings. The list should be composed exactly of the lines of output that istat would produce when run on the same filesystem and inode, ignoring leading and trailing whitespace, and subject to the notes in this assignment (timezones, additional attributes, Security ID).

As before, most of the values in the output of istat should be self-explanatory given lecture material and Carrier's text (though please feel free to ask if you don't know what a value is or where it comes from!).

We will test your function only on metadata addresses pointing to MFT entries that represent undeleted files; the $DATA for these files may be resident or nonresident.

We will not test your function on invalid values (invalid metadata addresses).

Important notes:

  • Ignore (don't include) the line in $STANDARD_INFORMATION that starts with Security ID:.
  • Ignore attributes that aren't $STANDARD_INFORMATION, $FILE_NAME, or $DATA.
  • Dealing with timestamps and timezones was the single most annoying thing I had to deal with when coding up the solution to this assignment. For now, do your best. I'll post with an update as to how to handle it when the tests and autograder are up (I will probably simplify it by canonicalizing the expected output to the same output that naive datetime objects can generate).

What to submit

Submit a single Python file named istat_ntfs.py. This file must define the function described above. It should look like the following, but with an actual implementation rather than a placeholder.

import datetime
import struct

def as_signed_le(bs):
    signed_format = {1: 'b', 2: 'h', 4: 'l', 8: 'q'}
    if len(bs) <= 0 or len(bs) > 8:
        raise ValueError()

    fill = b'\x00'
    if ((bs[-1] & 0x80) >> 7) == 1:
        fill = b'\xFF'

    while len(bs) not in signed_format:
        bs = bs + fill
    result = struct.unpack('<' + signed_format[len(bs)], bs)[0]
    return result

def istat_ntfs(f, address, sector_size=512, offset=0):

def into_localtime_string(windows_timestamp):
    Convert a windows timestamp into istat-compatible output.

    Assumes your local host is in the EDT timezone.

    :param windows_timestamp: the struct.decoded 8-byte windows timestamp 
    :return: an istat-compatible string representation of this time in EDT
    dt = datetime.datetime.fromtimestamp((windows_timestamp - 116444736000000000) / 10000000)
    hms = dt.strftime('%Y-%m-%d %H:%M:%S')
    fraction = windows_timestamp % 10000000
    return hms + '.' + str(fraction) + '00 (EDT)'

if __name__ == '__main__':
    import argparse

    parser = argparse.ArgumentParser(description='Display details of a meta-data structure (i.e. inode).')
    parser.add_argument('-o', type=int, default=0, metavar='imgoffset',
                        help='The offset of the file system in the image (in sectors)')
    parser.add_argument('-b', type=int, default=512, metavar='dev_sector_size',
                        help='The size (in bytes) of the device sectors')
    parser.add_argument('image', help='Path to an NTFS raw (dd) image')
    parser.add_argument('address', type=int, help='Meta-data number to display stats on')
    args = parser.parse_args()
    with open(args.image, 'rb') as f:
        result = istat_ntfs(f, args.address, args.b, args.o)
        for line in result:


The tests will use istat output where the timezone is set to EDT (istat -z EST5EDT), and the into_localtime_string() function above matches this assumption. There is a better way to do it, but not without more Python timezone shenanigans than I'm willing to engage in.


Tests in the usual format are available here: istat_ntfs_tests.zip

You can also take a look at Carrier's Digital Forensics Tool Testing Images , several of which are NTFS images. Here's another sample output from test image #3 (NTFS Keyword Search Test #1), showing a non-resident $DATA attribute:

MFT Entry Header Values:
Entry: 35        Sequence: 1
$LogFile Sequence Number: 1078953
Allocated File
Links: 1

Flags: Archive
Owner ID: 0
Security ID: 258  (S-1-5-21-1757981266-484763869-1060284298-1003)
Created:    2003-10-23 13:18:09.779432000 (EDT)
File Modified:  2003-10-23 13:20:38.493272000 (EDT)
MFT Modified:   2003-10-23 13:20:38.493272000 (EDT)
Accessed:   2003-10-23 13:20:38.493272000 (EDT)

$FILE_NAME Attribute Values:
Flags: Archive
Name: file-n-3.dat
Parent MFT Entry: 5     Sequence: 5
Allocated Size: 0       Actual Size: 0
Created:    2003-10-23 13:18:09.779432000 (EDT)
File Modified:  2003-10-23 13:18:09.779432000 (EDT)
MFT Modified:   2003-10-23 13:18:09.779432000 (EDT)
Accessed:   2003-10-23 13:18:09.779432000 (EDT)

Type: $STANDARD_INFORMATION (16-0)   Name: N/A   Resident   size: 72
Type: $FILE_NAME (48-2)   Name: N/A   Resident   size: 90
Type: $DATA (128-3)   Name: N/A   Non-Resident   size: 2600  init_size: 2600
8056 8057 8058 8072 8073 8074 

fls and NTFS

You may find it helpful to use fls to get metadata addresses to generate your own tests. fls lists the contents of directories stored in a volume, and includes information on deleted files. fls can take a NTFS filesystem as input and return output as follows:

fls simple.ntfs 
r/r 4-128-1:    $AttrDef
r/r 8-128-2:    $BadClus
r/r 8-128-1:    $BadClus:$Bad
r/r 6-128-1:    $Bitmap
r/r 7-128-1:    $Boot
d/d 11-144-2:   $Extend
r/r 2-128-1:    $LogFile
r/r 0-128-1:    $MFT
r/r 1-128-1:    $MFTMirr
r/r 9-128-2:    $Secure:$SDS
r/r 9-144-3:    $Secure:$SDH
r/r 9-144-4:    $Secure:$SII
r/r 10-128-1:   $UpCase
r/r 10-128-2:   $UpCase:$Info
r/r 3-128-3:    $Volume
r/r 64-128-2:   testing.txt
d/d 65: $OrphanFiles

The last entry is for orphan files.

Note that for NTFS, the metadata addresses are a little more complicated than in FAT. See TSK's wiki's NTFS Implementation Notes for details; for this assignment, we will only expect you to handle the primary $DATA attribute, so you can (and should) expect just the first number in the metadata address as an argument to your function.