21: Storage Technology, Expert Witnessing (A Dramatic Presentation)
Announcements
Today is the last new content day of the course. We’ll be doing wrap-up and review on Tuesday.
Please fill out the online SRTIs! I really do read, appreciate, and try to use all the feedback that students give me.
Magnetic storage
A refresher. Recall that E/M are closely related. Current is a flow of electrons. The flow of electrons generates a magnetic field according to the right hand rule.
You know what else generates a magnetic field? Magnets (aka magnetic materials). You can align the field in a magnet in various ways, one of which is inducing a strong enough external magnetic field near it (like, say by use of an electical current).
The movement of a conductor across a magnetic field (such as generated by magnets) generates a current, which you can measure.
In other words, the generation of a current by movement of a conductor let you “read” the magnetic field. And applying a stronger current lets you “overwrite” an existing magnet’s orientation.
These two facts are the basis of magnetic storage media.
Magnetic data read/write
The term flux describes a magnetic field with a specific direction (or “polarity”). The drive head creates “flux reversals” on the medium to record data. The given pattern it creates for a particular sequence of bits is described by the “encoding method.”
To create a reversal, the write head reverses the voltage (the direction of the current flow), aka “reverses the polarity” of the electricity to reverse the polarity of the flux. When reading over the disk, the head sees no voltage until it crosses a reversal, then the flux transition induces a small amount of current flow (and thus voltage) in the head.
Lots of electronics and precision machinery come together to make this process work at high speeds, sending one or more heads to the right part of a spinning platter of magnetic media and timing the careful application of voltage (to write) or sensors, amplifiers, filters, etc. (to read).
Data encoding
Hard disks use run length limited encoding (or similar techniques) to store bits in a sequence of flux reversals. RLL encodes bits a group at a time. The term RLL is derived from the to primary variables, the minimum number (the run length) and the maximum number (run limit) of the transition cells allowed between two actual flux transitions.
A simple code is the “Modified Non-Return-to-Zero-Inverted” code, where 1 is a transition and 0 is a non-transition. We write N for no-flux-transition, and T for flux-transition:
0 -> N
1 -> T
The problem you run into is that if there’s any instability in the speed of the drive, it’s possible to desynchonize during long runs of zeros.
RLL encodings help by making sure there are occasional transitions regardless of underlying bit pattern; this helps make sure the mechanical spin of the drive is kept synchronized with what the electronics are expecting (this is called “self-clocking”).
A very simple RLL (0, 1) is essentially an FM encoding.
0 -> T N
1 -> T T
This was used in very early floppy drives.
But it’s got a lot of overhead (two transitions per bit). Others that are popular are (1,7) and (2,7) encoding. For example, (1, 7) RLL codes encode two bits as three N/T slots, and four bits as six N/T slots:
00 -> TNT
01 -> TNN
10 -> NNT
11 -> NTN
0000 -> TNT NNN
0001 -> TNN NNN
1000 -> NNT NNN
1001 -> NTN NNN
In this encoding, there’s always one non-transition slot between two transitions, and there’s at most 7 N slots between Ts. This bound helps us keep thing synchronized without too much overhead.
Flash memory
Flash memory of the type we see in SDDs is built atop floating gate metal–oxide–semiconductor field-effect transistors. Say that five times fast. These are programmable transistors that act like NAND gates (readable). They start in one state and can be “set” (written) to another, or “reset” (erased).
For those of you who aren’t EE/ECE (including me), what you need to know is that flash memory can be addressed bit-by-bit if need be. It’s organized into (conceptully vertical) strings (linear arrays of transistors that behave like NAND gates) of 32 or 64 transistors, each representing a byte. Individual gates are addressed by horizontal “word lines”.
We can read. We can write, once. But to re-write, a whole “block” of NAND gates must be re-set to their base state first, for reasons we’re not going to go into since this isn’t an ECE course. Notably, erasing is the slowest operation by far. (Typically by at least one order of magnitude.)
SSD controllers then have a choice. They can allow overwrites of a given “sector” by erasing then writing. Or they can dynamically map sectors to arbitrary, ready-to-write (already erased) locations in their flash arrays. The “empty” but not erased sectors can then be erased by the drive controller when the drive is otherwise idle.
The former strategy leads to lackluster performance, since writes are then only as fast as erasures.
If the drive has “extra” space, it can add “empty” sectors to this space and erase them as time permits, then return them to the pool of “available” space. But the best plan is to get extra information from the OS about when a sector can be erased (like, when it’s deleted in the filesystem) and then add that sector to the list of “empty” sectors to be scheduled to be erased.
Enter the TRIM command.
TRIM
TRIM is, in short, a command that the disk driver in the OS can send to SDD-based disk controllers. TRIMming a sector tells the SDD that the sector in question can be erased – it is not being used to store data any more. Most modern OSes now support TRIM. The better integrated the OS/hardware/etc, the more likely it is to work, as the OS, driver, interface, and controller must all correctly implement TRIM for the command to be issued. This is becoming more common as SDDs become more standard.
Blocks scheduled for TRIMming will be erased by the controller even if the device is behind a write blocker. Whether or not the old data will be visible if the block is read depends upon the type of TRIM implemented:
- non-deterministic: Who knows? Could be the original data, or zeros, or something else. No guarantee of consistency.
- deterministic zero after trim (DZAT): all read commands after a TRIM return zeroes until the page is written new data.
- deterministic read after trim (DRAT): The data returned by SSD drives supporting DRAT as opposed to DZAT can be all zeroes or other words of data (such as the original pre-trim data stored in that logical page). The essential point is that the values read from a trimmed logical page do not change between when the TRIM command is issued and when new data get written into that logical page.
Expert Witnessing: A Review
Remember earlier in the semester we talked about being an expert witness. The judge is the ultimate arbiter, and there are various rules that must be followed to establish an expert’s bona fides.
A dramatic example
Delayed until next class due to A/V failure!