11: FATs and Directory Entries

Annoucements

Due dates are due dates, mis amigos. Occasionally I'll grant extensions, but generally I give you enough time to get assignments done. Unless, you know, you start the night before something is due and run into trouble. You had about a week and a half for A05 (plus bonus days), and while most people were done in advance, about 25 or so weren't done on time. I'm not going to turn granting last-minute extensions into a habit, so please start earlier in the future. Do you want me to get down on my knees and beg? I'll do it!

I know the assignments can be challenging, and I know that you might think "you get it" until you sit down to do it. That's why I ask you to do it. I want you to demonstrate mastery. To quote Professor B. Lee, "knowing is not enough, we must apply. Willing is not enough, we must do." This is a systems course -- to do well, you'll need to demonstrate you understand systems well enough to build them.

The Withdraw deadline for undergraduates is tomorrow; the CS main office (CS 100) can sign the form for you on my behalf, but you still have to get it to the registrar before 5:00pm.

More on FAT

Today we're going to finish up talking about the FAT structure then move on to the last piece of FAT, the directory entries.

Let's look at the adams.dd disk image way back from lecture 2. First, note that it's not an image of an entire disk:

> mmls adams.dd
Cannot determine partition type

Though my local version of file gives a hint:

> file adams.dd
adams.dd: DOS/MBR boot sector, code offset 0x3c+2, OEM-ID "BSD  4.4", sectors/cluster 2, root entries 512, sectors 10239 (volumes <=32 MB) , sectors/FAT 20, sectors/track 32, heads 16, serial number 0x36c013ef, label: "ADAMS      ", FAT (16 bit)

Looks like someone extracted only the FAT16 partition. So let's use fsstat to take a look:

FILE SYSTEM INFORMATION
--------------------------------------------
File System Type: FAT16

OEM Name: BSD  4.4
Volume ID: 0x36c013ef
Volume Label (Boot Sector): ADAMS      
Volume Label (Root Directory):
File System Type Label: FAT16   

Sectors before file system: 0

File System Layout (in sectors)
Total Range: 0 - 10238
* Reserved: 0 - 0
** Boot Sector: 0
* FAT 0: 1 - 20
* FAT 1: 21 - 40
* Data Area: 41 - 10238
** Root Directory: 41 - 72
** Cluster Area: 73 - 10238

METADATA INFORMATION
--------------------------------------------
Range: 2 - 163174
Root Directory: 2

CONTENT INFORMATION
--------------------------------------------
Sector Size: 512
Cluster Size: 1024
Total Cluster Range: 2 - 5084

FAT CONTENTS (in sectors)
--------------------------------------------
75-76 (2) -> EOF
3743-8792 (5050) -> EOF

Most of this information comes from the boot sector as we described last lecture. There are a few things that don't, though, namely the Volume Label (Root Directory), the METADATA INFORMATION (well, not exactly), and the FAT CONTENTS.

Let's take a look at FAT1 (the first FAT in this FAT16 filesystem), which starts at sector 1 and runs through sector 20. I'm going to use dd and pipe into hexdump so that the offsets are from the start of the FAT. Also note I'm going to omit the -v argument, so that identical lines in the hexdump are condensed:

dd if=adams.dd bs=512 skip=1 count=20| hexdump -C
20+0 records in
20+0 records out
10240 bytes transferred in 0.000112 secs (91382283 bytes/sec)
00000000  f0 ff ff ff 00 00 ff ff  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000e50  00 00 00 00 00 00 00 00  00 00 2e 07 2f 07 30 07  |............/.0.|
00000e60  31 07 32 07 33 07 34 07  35 07 36 07 37 07 38 07  |1.2.3.4.5.6.7.8.|

The * means that every line following the previous had the same value (in this case, all 00s).

Let's pick apart these first few lines, then examine fsstat's output to see how it represents them.

First, remember that entries are sequential in the FAT, 16 bits per cluster. So the first two entries are f0 ff and ff ff. But they don't refer to a cluster, since the clusters start at 2! (Thanks, Microsoft!).

The entry for cluster 2 starts at offset 0x4 (and this is generally true in FAT16: you can find the entry in the FAT for cluster N at offset N x 2 from the start of the FAT. Its value here is 00 00, which means the cluster is unallocated: there is no file data stored there.

The next cluster, 3, is represented by the entry at 0x6, and its value is ff ff, indicating it's the last cluster in whatever it represents. Nothing in the FAT tells you if a cluster is the first (or only) cluster of a file: that information is in the directory entries, which we'll get to in a bit.

Let's pause and see how fsstat has represented this allocation in its FAT CONTENTS:

75-76 (2) -> EOF

What's going on here? Sleuth Kit mostly uses sectors, not clusters, when referring to locations on disk. So it's telling us that there's a something that runs from sector 75 through sector 76 (2 sectors), and then ends. Does that match the FAT?

Well, according to fsstat, the Cluster Area starts a sector 73. And clusters in this FAT are 1024 bytes (two sectors). So cluster 2 is sectors 73 and 74, and cluster 3 is on sectors 75 and 76, just as fsstat shows.

Let's go the other way. fsstat lists the next allocated sectors as:

3743-8792 (5050) -> EOF

Which cluster(s) does this correspond to? Well, the first sector is 3743. That's the 3670th sector of the cluster area. Since on this filesystem the clusters are two sectors long, it's the 1835th cluster. But clusters start numbering at 2, so it's actually cluster 1837. Cluster 1837 will be located at offset 1837 x 2 = 3674 from the start of the FAT, which is 0xe5a. What's in the hexdump at that offset?

00000e50  00 00 00 00 00 00 00 00  00 00 2e 07 2f 07 30 07  |............/.0.|
00000e60  31 07 32 07 33 07 34 07  35 07 36 07 37 07 38 07  |1.2.3.4.5.6.7.8.|

2e 07. If we interpret that as a little-endian value, what is it? 1838 -- the next cluster! What's there? 2f 07 (1839). And so on. This run of clusters is actually 2525 clusters long, so we're not going to do it all here, but you get the idea, I hope.

Directory entries

Time for the last piece of the puzzle. How do you store file and directory metadata in a FAT filesystem, and how do you map files to the clusters where their contents are stored? FAT uses a 32-byte structure called the directory entry to handle these tasks. Directory entries are stored in the cluster area just like files.

The contents (files and directories) of a directory (like C:\MARC\) is represented by set of directory entries stored in a cluster (or several clusters, just like a multi-cluster file). Which cluster? The enclosing directory's directory entry will tell you. Where's the root directory? In FAT16 its location is hardcoded as right after the FAT, at the start of the data area (but before the first cluster). Its size can be calculated by checking the max number of entries (listed in the FAT boot sector, usually 512) and multiplying by 32 (bytes per directory entry). Cluster 2 follows immediatly on disk.

Directory entries are laid out as follows:

0x0 (11 B): 8.3 filename in ASCII; first byte will be 0x00 or 0xe5 if unallocated; 0xe5 means deleted and 0x00 (usually) means that no later entry in this sequence of directory entries is allocated

0xB (1 B): attributes, ORed together

  • 0x01: read only
  • 0x02: hidden
  • 0x04: system
  • 0x08: volume label
  • 0x10: directory
  • 0x20: archive

(This is a special "long file name" entry, not a directory entry, if its attributes are equal to 0x0f. More on that later.)

0xC (1 B): reserved

0xD (5 B): creation date/time in wacky FAT format

0x12 (2B): accessed date

0x14 (2B): high-order bytes of first cluster address (always 0 in FAT 16, only used in FAT32)

0x16 (4B): written date/time

0x1A (2B): low-order bytes of the first cluster address

0x1C (4B): file size (0 for directories)

I'm going to leave the fine details of parsing the FAT date/time format to Carrier: read the text or relevant Wikipedia entry if and when you need them.

Let's look at an example. The root directory entries for adams.dd start at sector 41 of the image, so let's take a look:

dd if=adams.dd bs=512 skip=41 count=32| hexdump -C
32+0 records in
32+0 records out
16384 bytes transferred in 0.000110 secs (149066110 bytes/sec)
00000000  41 44 41 4d 53 20 20 20  20 20 20 28 00 00 00 00  |ADAMS      (....|
00000010  00 00 00 00 00 00 e1 62  1e 39 00 00 00 00 00 00  |.......b.9......|
00000020  41 69 00 6d 00 61 00 67  00 65 00 0f 00 71 73 00  |Ai.m.a.g.e...qs.|
00000030  00 00 ff ff ff ff ff ff  ff ff 00 00 ff ff ff ff  |................|
00000040  49 4d 41 47 45 53 20 20  20 20 20 10 00 00 c4 79  |IMAGES     ....y|
00000050  e1 38 1c 39 00 00 4f 84  1c 39 03 00 00 00 00 00  |.8.9..O..9......|
00000060  41 44 00 65 00 73 00 69  00 67 00 0f 00 d4 6e 00  |AD.e.s.i.g....n.|
00000070  73 00 2e 00 64 00 6f 00  63 00 00 00 00 00 ff ff  |s...d.o.c.......|
00000080  44 45 53 49 47 4e 53 20  44 4f 43 20 00 00 4e 81  |DESIGNS DOC ..N.|
00000090  1c 39 1c 39 00 00 4e 81  1c 39 2d 07 00 72 27 00  |.9.9..N..9-..r'.|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00004000

Each entry is 32 bytes (two lines of this hexdump). What type is the first entry? Looking at offset 0x0b, we see the value 28, which means this directory entry is a "Volume label" -- "ADAMS" in particular.

Let's skip to the next one. 0f: a "long file name" (LFN). We'll come back to this (I promise!) soon.

Next, 10: a directory! The two bytes at offset 0x1a (from the start of this entry) tell us which cluster the directory is stored in. 03 00 means that the IMAGES directory's entries are stored in cluster 3 of this filesystem. Ring a bell from earlier?

What's at that location, anyway? Cluster 3 is ((3 - 2) * 2) + 73 sectors past the start of the filesystem, at sector 75:

dd if=adams.dd bs=512 skip=75 count=2| hexdump -C
2+0 records in
2+0 records out
1024 bytes transferred in 0.000023 secs (44278013 bytes/sec)
00000000  2e 20 20 20 20 20 20 20  20 20 20 10 00 00 4e 5c  |.          ...N\|
00000010  a1 38 a1 38 00 00 4e 5c  a1 38 03 00 00 00 00 00  |.8.8..N\.8......|
00000020  2e 2e 20 20 20 20 20 20  20 20 20 10 00 00 4e 5c  |..         ...N\|
00000030  a1 38 a1 38 00 00 4e 5c  a1 38 00 00 00 00 00 00  |.8.8..N\.8......|
00000040  e5 4d 47 5f 33 30 32 37  4a 50 47 20 00 00 c4 79  |.MG_3027JPG ...y|
00000050  e1 38 e1 38 00 00 c4 79  e1 38 04 00 8c a0 1c 00  |.8.8...y.8......|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400

The first item is a directory (again, at offset 0xb into the entry, value 10), named ".", which is shorthand for "the current directory." It's at cluster 03 00 -- right here.

Next comes an entry for a directory called "..", which command line nerds know means "the enclosing directory," in other words, a back reference. Since the enclosing directory is the root directory, which isn't at a particular cluster in FAT16, this gets the marker value 00 00.

Finally, there's an entry for a deleted file, which we can identify by the leading e5 value in its name. It used to be located at cluster 04 00 -- cluster 4, but it's gone now. Or is it? Ponder.

Anyway, back to the root directory entries:

00000060  41 44 00 65 00 73 00 69  00 67 00 0f 00 d4 6e 00  |AD.e.s.i.g....n.|
00000070  73 00 2e 00 64 00 6f 00  63 00 00 00 00 00 ff ff  |s...d.o.c.......|
00000080  44 45 53 49 47 4e 53 20  44 4f 43 20 00 00 4e 81  |DESIGNS DOC ..N.|
00000090  1c 39 1c 39 00 00 4e 81  1c 39 2d 07 00 72 27 00  |.9.9..N..9-..r'.|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Next, another LFN.

Finally(!!), a file named DESIGNS.DOC, stored at cluster 2d 07, that is, cluster 1837. Ring a bell from earlier? We know (from the directory entry) that it starts at cluster 1837, and we know from the FAT that it runs contiguously for 2525 clusters. Is it exactly 2525 * 1024 (the cluster size) = 2,585,600 bytes? No, its actual size is stored in the last four bytes of the entry: 00 72 27 00 -> 2,585,088, which fits into the cluster with 512 bytes to spare (these are "slack space").

So now we've mostly parsed all the directory entries on this disk. Let's take a brief detour into LFNs, then try to find the deleted file.

Long File Names

LFN entries, marked by the 0f value in the attributes, come before whichever file they refer to; they're a backward compatible way to give longer names than the old DOS 8.3 system. Older OSes ignore directory entries with this otherwise incompatible set of flags set, which is why Microsoft designed it this way.

LFN entries repurpose most of the fields in the directory entry to store the characters of the filename.

0x0 (1B): sequence number, starting at 1, not 0; last one is ORed with 0x40

0x1 (10 B): 5 UCS-2 characters (UCS-2 is a subset of UTF-16 that can only handle codepoints in the basic multilingual plane)

0xB (1B): attributes

0xC (1B): unused

0xD (1B): checksum

0xE (12B): 6 UCS-2 characters

0x1A (1B): reserved

0x1B (4B): 2 UCS-2 characters

Each LFN can hold 13 characters. If a filename needs more than 13 characters, then more than one LFN will precede the directory entry. They come in reverse order, last first. The last's sequence number is ORed with the value 0x40. For example, if there was a file with the name "File with very long filename.ext", which needs 3 LFN entries, the sequence numbers and LFN directory entries would be:

0x43 "me.ext"
0x02 "y long filena"
0x01 "File with ver"

then a regular directory entry for the file.

I'm going to skip the checksum calculation; again, see Carrier or other resources if you need the details.

Recovering a deleted file

Earlier, we found an entry corresponding to a deleted file:

dd if=adams.dd bs=512 skip=75 count=2| hexdump -C
2+0 records in
2+0 records out
1024 bytes transferred in 0.000023 secs (44278013 bytes/sec)
00000000  2e 20 20 20 20 20 20 20  20 20 20 10 00 00 4e 5c  |.          ...N\|
00000010  a1 38 a1 38 00 00 4e 5c  a1 38 03 00 00 00 00 00  |.8.8..N\.8......|
00000020  2e 2e 20 20 20 20 20 20  20 20 20 10 00 00 4e 5c  |..         ...N\|
00000030  a1 38 a1 38 00 00 4e 5c  a1 38 00 00 00 00 00 00  |.8.8..N\.8......|
00000040  e5 4d 47 5f 33 30 32 37  4a 50 47 20 00 00 c4 79  |.MG_3027JPG ...y|
00000050  e1 38 e1 38 00 00 c4 79  e1 38 04 00 8c a0 1c 00  |.8.8...y.8......|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400

The file was named "?MG_3027.JPG" -- notice that we lose the first character of a filename if it's deleted. It used to be stored at cluster 4.

Sleuthkit also sees this file (-r shows everything, recursively):

fls -r adams.dd
r/r 3:  ADAMS       (Volume Label Entry)
d/d 5:  images
+ r/r * 549:    _MG_3027.JPG
r/r 7:  Designs.doc
v/v 163171: $MBR
v/v 163172: $FAT1
v/v 163173: $FAT2
d/d 163174: $OrphanFiles

or if we want to go entry-by-entry:

fls adams.dd
r/r 3:  ADAMS       (Volume Label Entry)
d/d 5:  images
r/r 7:  Designs.doc
v/v 163171: $MBR
v/v 163172: $FAT1
v/v 163173: $FAT2
d/d 163174: $OrphanFiles

fls adams.dd 5
r/r * 549:  _MG_3027.JPG

Where does this metadata address of 549 come from? It's clearly not a cluster number or whatnot, right?

Many filesystems have a concept of "inodes", which are unique metadata addresses that files and directories share. Not FAT. So instead TSK generates unique metadata addresses for FAT. The root directory entry is given the value 2. Each sector of the disk, starting at the beginning of the data area, could hypothetically contain 16 entries, so we number them starting from 3. This means that, say, the 512 entries in the root directory are numbered 3--514. Then there's gonna be gaps, since most sectors don't actually hold directory entries.

Recall that our cluster area started at sector 73, and the directory entries we extracted for this "IMAGES" directory were at cluster 75. If sector 73 and 74 were full of directory entries, there'd be 32 (16 per sector) in each of them. And our deleted file is the 3rd entry in the next sector.

514 + 32 + 3 = 549, the metadata address. Boom.

OK, how long was this file before it was deleted? Looking at the last four bytes of the directory entry 8c a0 1c 00 show it was 1,876,108 bytes long, which would have required 1833 1KB clusters to store.

Interestingly, that's exactly how many clusters are currently marked as unallocated between its old starting cluster (4) and the next cluster allocated on the disk (1837). I wonder if those bytes look like a JPEG? Remember, cluster 4 starts two clusters past the start of the cluster area, which is sector 73 + 4 = 77.

dd if=adams.dd of=IMG_3027.JPG bs=512 skip=77 count=3666
hexdump -Cv IMG_3027.JPG|less

Those headers look familiar to you at all?

This is (almost) what icat (remember that?) from the second lecture does. icat is a little smarter. For example, it will truncate the file to the file size listed in the directory entry.

We'll pick up here next class, and talk about what you can meaningfully extract, and how, from a FAT16 filesystem using this and other techniques. Then we'll start NTFS, but we won't get to the bulk of NTFS's many details until after break.