13: Introduction to NTFS
DRAFT
Announcements
Midterm almost graded.
A08 due Friday, lots of good questions (and less good ones) on Piazza and in office hours. A09 due next week on Wednesday.
NTFS: An overview
NTFS is the default filesystem on all modern versions of Windows; thus, like FAT, it’s in wide use. Unlike FAT, it’s in use as the filesystem that most home and business computers run on.
In some ways NTFS is simpler than FAT: it was designed from the ground up to be extensible, and so its design is more principled (and free of the legacy cruft that encumbers, say, the FAT boot sector and its FAT12/16/32 nonsense).
That’s the good news. The bad news is that the extensibility is not theoretical. Generic on-disk data structures wrap more specific data structures, so that the internals can be updated over time. To understand NTFS, we’ll need to cover both the generic and specific data structures (though not all of them), and that means there’s a lot of details to keep straight in your head. I’ll do my best in lecture, but you are almost certainly going to need to read and re-read Carrier as well to understand this material.
One core concept in NTFS is simple: Everything is (or is stored in) a file. Regular files, directories, the structures that control the filesystem’s layout on disk (like the FAT from FAT16) – all are either files or stored in files. There’s no separate plane of existence for filesystem metadata (like inodes in a UNIX-y filesystem or the FAT + directory entries in FAT16). Certain special files have special names (like what we think of as the “boot sector” from FAT is called $Boot
in NTFS), but they’re all just considered files by the file system.
The next core concept in NTFS is that the above is a bit of a cheat. Most (but not all) of what we think of as filesystem metadata is stored in one particular data structure: the Master File Table (MFT), which is stored as a file (named $MFT
of course) but contents-wise is analogous to the FATs and dirents, as we’ll see. We’re going to spend a lot of time today and next week talking about the MFT and how it relates to the files stored on disk.
The final high-level thing you need to know about NTFS is that it breaks a disk up into allocatable units called clusters, just like FAT. Just like FAT, clusters are sized as a power-of-two-multiple of the underlying disk sector size. Unlike FAT, though, cluster 0 starts at the beginning of the partition, so there’s none of the “first cluster is cluster number 2” nonsense to contend with.
Finding the MFT
The Master File Table (MFT) contains information about all files and directories in its NTFS. Each file/directory has an MFT entry; the table is just a linear array of MFT entries, numbered with a file number, starting from 0.
How do we find the MFT? Just like in FAT, the first sector of the volume contains a boot sector (which again, in NTFS is just a file, named $Boot
). That boot sector encodes the minimal information necessary to understand and parse the volume, including the bytes per sector, sectors per cluster, cluster address of the MFT, and the MFT entry size. Once you can find the MFT, you need to go there to learn the rest of what you need to know about the volume.
Let’s look at an example. Download simple.ntfs and follow along with Table 13.18 on page 380 in Carrier.
(Note we almost certainly won’t get through all of this today.)
00000000 eb 52 90 4e 54 46 53 20 20 20 20 00 02 08 00 00 |.R.NTFS .....|
00000010 00 00 00 00 00 f8 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 80 00 80 00 ff 4f 00 00 00 00 00 00 |.........O......|
00000030 04 00 00 00 00 00 00 00 ff 04 00 00 00 00 00 00 |................|
00000040 f6 00 00 00 01 00 00 00 a4 a4 a1 72 46 d9 dc 42 |...........rF..B|
00000050 00 00 00 00 fa 33 c0 8e d0 bc 00 7c fb 68 c0 07 |.....3.....|.h..|
The bytes per sector are stored in bytes 11–12. Here it’s 512.
The sectors per cluster are stored in byte 13. Here it’s 8, so clusters are 8 * 512 B = 4 KB clusters.
The cluster address of the MFT is stored in bytes 48–55. Here, it’s 4.
The size of the file record is at byte 64. It (and the size of the index record, at byte 68) is stored in a special format. If, when interpreted as a signed byte it’s positive, then it’s the number of clusters used for that record. If it’s negative, than 2^(abs(value)) bytes are used. Here it’s -10, which means that file records are 1KB each (this is the default value).
Compare this with index records, the size of which are stored in byte 68. Here, it’s 1, which means index records are 4KB (one cluster) long.
So now we can go and find the start of the MFT in the volume. It’s at cluster 4. Cluster 4 is 4 * 4,096 bytes into the file, at offset 0x4000.
Let’s double check against the output of fsstat
to see that we’re doing this correctly.
fsstat simple.ntfs
# ... output follows ...
What’s in the MFT?
The MFT is just a sequence of entries. The first 16 are reserved by MS for filesystem metadata information, but in practice it’s the first 24 that are reserved. Table 11.1 shows the contents of the reserved entries.
Entry 0 is an entry for the MFT itself. We need this, because although the boot sector tells us where the MFT starts, it (the MFT) might run across multiple clusters. This entry tells us where to find the rest of the MFT!
Entry 3 is the $Volume information; entry 6 is for the $Bitmap (similar to the FAT, but it only tracks allocation, not runs); entry 7 is for the $Boot sector, and so on.
We’re going to look at one shortly. But before we do, let’s talk a little about the general structure of an MFT entry.
It starts with an MFT entry header, described in detail in Table 13.1. Then there’s a sequence of attribute (header, content) pairs, with (usually) some unused space at the end of the entry.
The attribute header identifies the attribute type, size, and name, among other things.
The attribute contents can have any format and any size: one perhaps obvious use is to store the contents of a file corresponding to the entry. Small attribute contents can fit in the MFT entry (one systems consequence is that small enough – roughly, under 700 B – files don’t automatically waste tons of space, as they do in FAT, since they don’t live in a cluster). These are called resident attributes generally, whether they store files or just other attribute content.
Larger attributes (again, might be files, might be other things) might not fit in the entry; these are called non-resident. Non-resident attributes are stored in clusters. The clusters are identified by runlists. Runlists are just lists of runs of contiguous clusters that hold the file. See Figure 11.6.
We know that the MFT is at 16K into this volume; let’s use some UNIXy tools to pull out the first entry so that we can see offsets from zero in this entry:
dd if=simple.ntfs of=zeroth-mft-entry bs=1024 count=1 skip=16
…and take a look at it.
00000000 46 49 4c 45 30 00 03 00 00 00 00 00 00 00 00 00 |FILE0...........|
00000010 01 00 01 00 38 00 01 00 98 01 00 00 00 04 00 00 |....8...........|
00000020 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................|
00000030 03 00 00 00 00 00 00 00 10 00 00 00 60 00 00 00 |............`...|
00000040 00 00 18 00 00 00 00 00 48 00 00 00 18 00 00 00 |........H.......|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000070 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 30 00 00 00 68 00 00 00 |........0...h...|
000000a0 00 00 18 00 00 00 02 00 4a 00 00 00 18 00 01 00 |........J.......|
000000b0 05 00 00 00 00 00 05 00 80 64 e1 e7 8b a1 d2 01 |.........d......|
000000c0 80 64 e1 e7 8b a1 d2 01 80 64 e1 e7 8b a1 d2 01 |.d.......d......|
000000d0 80 64 e1 e7 8b a1 d2 01 00 70 00 00 00 00 00 00 |.d.......p......|
000000e0 00 6c 00 00 00 00 00 00 06 00 00 00 00 00 00 00 |.l..............|
000000f0 04 03 24 00 4d 00 46 00 54 00 00 00 00 00 00 00 |..$.M.F.T.......|
00000100 80 00 00 00 48 00 00 00 01 00 40 00 00 00 01 00 |....H.....@.....|
00000110 00 00 00 00 00 00 00 00 12 00 00 00 00 00 00 00 |................|
00000120 40 00 00 00 00 00 00 00 00 30 01 00 00 00 00 00 |@........0......|
00000130 00 04 01 00 00 00 00 00 00 04 01 00 00 00 00 00 |................|
00000140 11 13 04 00 00 00 00 00 b0 00 00 00 48 00 00 00 |............H...|
00000150 01 00 40 00 00 00 03 00 00 00 00 00 00 00 00 00 |..@.............|
00000160 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000170 00 10 00 00 00 00 00 00 10 00 00 00 00 00 00 00 |................|
00000180 10 00 00 00 00 00 00 00 11 01 02 00 00 00 00 00 |................|
00000190 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 |................|
00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000003f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 |................|
00000400
See Table 13.1 on page 353.
It starts with the four byte sequence corresponding to ASCII "FILE"
(or "BAAD"
if there’s an error on the disk in this entry). Let’s find the first attribute – remember, all entries in the MFT consist of an MFT header, followed by attributes (which themselves consist of headers and contents). This first attribute’s offset (from the start of the entry) is stored in bytes 20–21. Here, its value is 56 (0x38).
Attributes: headers and contents
Let’s skip to the attribute, which starts with a header in a standard format. See Tables 13.2–13.4. The first 16 bytes are the same in resident and non-resident attribute headers; after that they diverge.
The header starts with a four-byte type tag. Here, it’s 16, which is a $STANDARD_INFORMATION header. This and $FILE_NAME (48) are two attributes that nearly every entry will have.
The next four bytes (4–7) tell us the length. Here it’s 96; this means the next attribute starts at offset 56 + 96 = 152 (96 from the start of the current attribute’s start).
Byte 8 (offset from 56, remember, so byte 64 in the dump) tells us if the attribute’s content is non-resident. Here it’s zero, so this attribute;s content is resident – that is, it’s embedded in the MFT entry.
(Discussion names for standard attributes? E.g., as ADS?)
Let’s jump ahead to bytes 16–19 (again: offset from 56, so go to 72 0x48) to get the size (72) and bytes 20–21 (offset 76: 0x4c) to get the offset (24) of this attribute’s content. (Sanity check: 72 + 24 = 96, which is the size of the attribute in total and also the offset to the next attribute)
So you get the idea; Tables 13.5 and 13.6 tell you how to parse the $STANDARD_INFORMATION attribute’s contents.
Let’s look at the next attribute for this entry (remember, we’re still looking at the zeroth entry in the MFT). We find it by skipping past the previous one. The previous one started at offset 56 and had a length of 96, so we need to skip ahead to byte 56 + 96 = 152 (0x98) of the entry.
This one has type 48 ($FILE_NAME). Let’s dig into this attribute. We see it’s 104 (0x68) bytes long, so let’s pull it out using dd
so we can see offsets from 0 (or just do it in hexfiend):
dd if=zeroth-mft-entry of=file_name_attribute bs=1 skip=152 count=104
00000000 30 00 00 00 68 00 00 00 00 00 18 00 00 00 02 00 |0...h...........|
00000010 4a 00 00 00 18 00 01 00 05 00 00 00 00 00 05 00 |J...............|
00000020 80 64 e1 e7 8b a1 d2 01 80 64 e1 e7 8b a1 d2 01 |.d.......d......|
*
00000040 00 70 00 00 00 00 00 00 00 6c 00 00 00 00 00 00 |.p.......l......|
00000050 06 00 00 00 00 00 00 00 04 03 24 00 4d 00 46 00 |..........$.M.F.|
00000060 54 00 00 00 00 00 00 00 |T.......|
The first four bytes tell us the type (48); the next four are the length (104 bytes). This attribute is also resident. Let’s skip ahead to bytes 20–21, which are the offset (from the start of the attribute) to the content. Here, it’s 24, so let’s go there.
The $FILE_NAME attribute is described in Table 13.7 on page 362. I’m going to use Hex Fiend to do the extraction here, but it’s the same as using dd
previously, or as slicing a sequence of bytes or seek()
ing in Python.
dd if=file_name_attribute of=file_name_attribute_content bs=1 skip=24
00000000 05 00 00 00 00 00 05 00 80 64 e1 e7 8b a1 d2 01 |.........d......|
00000010 80 64 e1 e7 8b a1 d2 01 80 64 e1 e7 8b a1 d2 01 |.d.......d......|
00000020 80 64 e1 e7 8b a1 d2 01 00 70 00 00 00 00 00 00 |.d.......p......|
00000030 00 6c 00 00 00 00 00 00 06 00 00 00 00 00 00 00 |.l..............|
00000040 04 03 24 00 4d 00 46 00 54 00 00 00 00 00 00 00 |..$.M.F.T.......|
The first eight bytes are the file reference of the parent directory. File references are composed of two parts: the file number and the sequence number. The file number we’ve already seen: it’s the index into the MFT to get to this entry, starting from zero. The sequence number is incremented each time an MFT entry is allocated to use. The two numbers are concatenated, with the 16-bit sequence number in the higher-order bytes (little endian), and the 48-bit file number in the lower-order bytes (little endian), to form a 64-bit file reference number. Note that like all values, it’s stored little endian, so the final format is:
FF FF FF FF FF FF SS SS
where FN are file number bytes and SS are sequence number bytes.
So in this $FILE_NAME, the file number is 5. 5 is one of the reserved slots in the MFT, which if we look up in Table 11.1, we see is the root directory.
The next sequence of 8 bytes is the file creation time. NTFS stores file-related times as the number of blocks of 100 ns since January 1st, 1601 UTC.
Here, the value is 8064E1E7 8BA1D201
so we can get the number in Python using:
import struct
timestamp = struct.unpack('<Q', bytes.fromhex('8064E1E7 8BA1D201'))[0]
To convert that to a time, we convert to a UNIX-style epoch (the number of seconds since January 1, 1970). This time is 116444736000000000 100 ns blocks since January 1st, 1601 UTC. So to convert, we can write:
import datetime
def as_datetime(windows_timestamp):
return datetime.datetime.fromtimestamp((windows_timestamp - 116444736000000000) / 10000000)
print(str(as_datetime(timestamp)))
Bytes 16–23 are the modification time; 24–31 are the MFT modification time; 32–39 are the last access time.
Bytes 40–47 are the allocated file size and bytes 48–55 are the actual size, but these are not required to be accurate unless this attribute is used in a directory index.
Bytes 56–59 are flags, just like in FAT (see Table 13.6).
Byte 64 is the length of the filename (4). Byte 65 is the namespace (See Table 13.8.; here it’s the Windows/DOS namespace). Bytes 66 onward are the name, in this case in UTF-16 (LE): $MFT. This is the $MFT entry, just like we expected for entry 0.
Note we can check all this using istat
on the relevant entry:
istat simple.ntfs 0-128-1
One last attribute to look at here, the $DATA attribute, which is next. Going back to the zeroth-mft-entry
, it starts at offset 256 (from the start of the entry, not the start of the entire MFT). How did we get this? It’s not a magic value: we computed it. Remember, the previous attribute ($FILE_NAME) started at offset 152 and was 54 bytes long. 54+152 = 256.
dd if=zeroth-mft-entry of=data_attribute bs=1 skip=256 count=72
00000000 80 00 00 00 48 00 00 00 01 00 40 00 00 00 01 00 |....H.....@.....|
00000010 00 00 00 00 00 00 00 00 12 00 00 00 00 00 00 00 |................|
00000020 40 00 00 00 00 00 00 00 00 30 01 00 00 00 00 00 |@........0......|
00000030 00 04 01 00 00 00 00 00 00 04 01 00 00 00 00 00 |................|
00000040 11 13 04 00 00 00 00 00 |........|
The first four bytes (value: 128) tell us it’s the $DATA attribute, and the next four tell us its length (72). The next byte tells us that this is a non-resident attribute, so its contents are stored somewhere in cluster(s) on the disk.
Let’s jump ahead to figure out where (that is, which clusters) the data is stored in.
Looking at Table 13.4, bytes 16–23 and 24–31 are used to tell us the starting and ending VCN of the runlist. The VCN is just a sequence of numbers 0..n-1 referring to the n clusters in a file in order. This is in contrast to the LCN, which is a list of the actual cluster numbers (on disk) that correspond to the VCN clusters. (on board)
Why does the non-resident header have this marker? Because for very long, fragmented files, you might not be able to fit the runlist into a single MFT entry. NTFS then needs to split the attributed across several MFT entries; this is how you figure out “where you are” in the entry.
Then there’s an offset to the runlist (from the start of this attribute) at bytes 32-33. The runlist is in the following format:
First there’s a single byte that describes the length and offset of the next run; then there’s a variable number of bytes describing the length of the run, and a variable number of bytes describing the offset to the run.
In more detail: The first byte is split into two nibbles (4-bit values). The low-order bits tell you the number of bytes in the run length; the high-order bits tell you the number of bytes in the offset to the run.
These values stored in the length and offset are in units of clusters, not bytes or sectors, and the offset bytes are signed.
And, the while the length is what you’d expect, the offset is relative to the previous offset in the runlist (the first offset is relative to the start of the filesystem, that is, cluster 0). Let’s look at some data:
11 13 04
So, byte 1 contains 11
, which in binary is
0001 0001
This run is described by a single byte offset and length. The length comes first: 13
, so it’s 19 clusters long. The offset comes next: 04
, so it starts 4 clusters past the start of the file system. Which is what we expect, again as shown by istat
.
istat simple.ntfs 0-128-1
...
Type: $DATA (128-1) Name: N/A Non-Resident size: 66560 init_size: 66560
4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19
20 0 0
...
Notice there are 19 values there. They start at 4 and then there are 19 of them. The last two are zero. Why? The allocated size of this attribute’s content (bytes 40–47: as an int: 77824) is exactly 19 4kB clusters. But the actual size (bytes 48–55: 66560) fits in 16.25 clusters. So the last two clusters are allocated but not used. istat
represents this by showing their numbers as zeroes.
If you continue parsing this entry, you’ll see that there’s one more attribute of type 0xb0
, which is a $BITMAP
attribute. This is used to track index records in the MFT, but we’ll not concern ourselves with it.
How do you know you are done parsing attributes? The MFT entry is of fixed size, and doesn’t include a “total length” in its header. Instead, you look for the hex value ffff ffff
where you’d expect an attribute to begin; that indicates you have finished. In this MFT entry they’re at offset 0x190 from the start of the entry.