Welcome Hello and welcome! I’m Marc Liberatore firstname.lastname@example.org and I’m your instructor for this course, COMPSCI 590K. The most important thing to know today: the course web site is at http://people.cs.umass.edu/~liberato/courses/2020-spring-compsci590K/. It includes the syllabus for this class and you are expected to read it in its entirety. If you are watching this online, great! Please say hello online! Who am I? Not techincally a “professor” – I am a teaching faculty member, and our title is (oddly European-ish) “Lecturer,” which nobody calls anybody.
Welcome Hello and welcome! Still the most important thing to know today: the course web site is at https://people.cs.umass.edu/~liberato/courses/2020-spring-compsci590k/. It includes the syllabus for this class and you are expected to read it in its entirety. Visit Moodle for access to other details. If you are watching this on-line, great! Hopefully we have audio today! Next 10 Years Let’s start by going over the Garfinkle paper. Please interrupt with questions if you have them!
Today we’re covering “Carving Contiguous and Fragmented Files with Object Validation”, another paper by Simson Garfinkel. It’s a pretty easy read and thus good for the early part of the semester. Later papers might not be so clear for various reasons. Anyway: the paper is about file carving. Carving, as you’ll recall, is finding arbitrary files embedded in other files, without the use of filesystem metadata. This comes up when filesystem metadata is deleted, damaged, or otherwise unavailable, or when you’re looking for files that might be embedded in other files.
Identification and Recovery of JPEG Files with Missing Fragments So you want to recover JPEGs. No problem, if they’re contiguous. But what if they’re not? Today’s paper formulates two possible problems and proposes ways to solve them. First, what if there’s a gap (or a small number), and the method of Garfinkel we saw last class isn’t good enough for some reason? Could you be smarter about choosing which blocks to test as part of the file, rather than the exhaustive search Garfinkel proposes?
Today’s paper is about recovering files compressed with the DEFLATE algorithm. As in last class’s paper, this scenario can arise when recovering deleted files, or when carving files from disk, or when files are damaged – all possiblities in the forensics context. The same basic problem presents: important information is in the header of the file. This information controls a transformation (in this case, compression) of the file. Without it, it’s not immediately clear how to recover the contents of the file.
Motivation There is a growing need for automated techniques and tools that operate on bulk data, and specifically on bulk data at the block level: File systems and files may not be recoverable due to damage, media failure, partial overwriting, or the use of an unknown file system. There may be insufficient time to read the entire file system, or a need to process data in parallel. File contents may be encrypted.
Cryptographic hashes work great to identify segments of identical data. But there’s a problem: what if we don’t know where the boundary of a file is? Like, the “JPEG-embedded-in-PDF” problem, or just some other circumstance where we can’t just do sector-based (or the equivalent) hashing? Or, what if the target of an investigation deliberately changes a file? Crypographic hashes are designed so that even a single bit change in the input will change ~1⁄2 of the output hash bits.
Overview We use hashes to find content of interest and discard contents not of interest. As pointed out last class, full-file and even block-level hashes have problems when items of interest are not stored as files but instead embedded in other files (and thus, not block aligned). Or, a similar problem occurs when minor modifications happen to a file – if just a few bytes of a file change, it would be great if we could still recognize it, but regular hash functions fail in this scenario.
Overview Perceptual hashing is the name for a class of algorithms that attempts to produce a fingerprint of image data (or other multimedia). Unlike a checkshum or cryptographic hash, though, a perceptual hash is a fuzzy hash – the perceptual hash of two similar images should be similar. There are many cases where you might want to do this – digital forensics, copyright enforcement, space reduction in databases, and so on.