19: Intro to Memory Forensics

Suppose you know that a server or VM you control has been compromised. For example, you have reason to believe that a rogue user is running a python process that’s up to no good.

So, you take a dump of its memory in hopes of “catching them in the act,” and suspend the server. Now it’s time to dig into the memory image to try to figure out what’s going on. Can you find the python process in question? Can you analyze it for details?

Yes we can. The Volatility framework (written in Python) is an open-source toolkit for doing memory forensics. It has many built-in functions for analyzing memory dumps, and is very extensible – there are lots of plugins for it available online.

Today I’m going to walk through its use. I’m also going to stop and review / explain some of the underlying OS concepts that it uses to do its work.

This walkthrough is going to be specific to Linux, and then to doing analysis of a Python process, but you would take the same basic steps to analyze other systems and processes. Lots to love if you are into C or other low-level programming and OS details.

Finding the kernel

As you undoubtedly recall from an undergraduate systems class, the kernel is the piece of software ultimately responsible for managing memory, processes, and other system resources. It’s (one of the) first thing(s) loaded into memory, and it keeps track of everything else.

The (compiled) kernel itself is loaded into memory, as are the data structures it uses, all into pre-defined addresses.

The data structures it uses are literally C structures. Their exact sizes and offsets of values in these structures are typically defined in “kernel headers”. For linux, these are the “linux headers”; usually your system package manager lets you get them via a linux-headers or kernel-headers package. For OS X, these are part of the Apple Debug Kit; and for Windows, they typically come built-in to Volatility for most versions of Windows.

Where do these structures live in memory? A symbol table tells you. It’s the “address book” telling you where in memory each structure will be placed (it tells you other stuff too). In Linux, you can just look at /boot/System.map. OS X can use dsymutil to pull it out of the kernel. And again, Volatility knows about the Windows symbol tables.

So, when you want to invoke volatility, you need to tell it where this stuff is. In particular, you need to tell it where the file you are analyzing is, where the profile lives (see the Volatility Wiki for information about how to build a profile – it’s kinda a pain) and the location of a plugin directory, if you have one, which might be where the profile lives. I ended up git cloneing the volatility source and using this checkout for this lecture. Volatility 2.6 doesn’t work with the newest Linux kernels (post 4.9 or so), you have to use a newer checkout.

To build the profile, follow the directions here: https://github.com/volatilityfoundation/volatility/wiki/Linux

though note you have to be inside a checkout of the volatility source, in the tools/linux subdirectory.

How do you get a memory dump? If you’re running a virtual machine, you might be able to get and use whatever it creates when it suspends a machine, or maybe directly acquire it, or some other thing (“volatility can analyze raw dumps, crash dumps, hibernation files, VMware .vmem, VMware saved state and suspended files (.vmss/.vmsn), VirtualBox core dumps, LiME (Linux Memory Extractor), expert witness (EWF), and direct physical memory over Firewire.”)

vol --filename=$MEMDUMP --profile=$PROFILE <command>

or more concretely on my system, I’m going to invoke “volshell” a command that loads a memory image, parses it according to a profile, and then puts you into an interactive Python shell with volatility and the image loaded:

python2.7  vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_volshell
Volatility Foundation Volatility Framework 2.6.1
Current context: process systemd, pid=1 DTB=0x3b29e000
Python 2.7.16 (default, Apr 17 2019, 17:19:44) 
Type "copyright", "credits" or "license" for more information.

IPython 5.4.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]:

Volatility notes that it believes that it’s found init, which is the first (non-kernel) process that is invoked in a Linux system (process id, or pid 1). init is ultimately the parent of all other processes in Linux. (In Windows, it’s the System process; in OS X it’s the kernel task.) A process is either spawned by init, or by a child of init, etc.

(What’s DTB? We’ll get there later.)

If Volatility is right about init then it means it’s properly parsed the image, so we should be able to find all other processes too. Let’s see what we have:

In [1]: self
Out[1]: <volatility.plugins.linux.linux_volshell.linux_volshell at 0x110ad0390>

In [2]: self?
Type:        linux_volshell
String form: <volatility.plugins.linux.linux_volshell.linux_volshell object at 0x110ad0390>
File:        /Volumes/Cobalt/365-and-590F/volatility-lab/volatility/volatility/plugins/linux/linux_volshell.py
Docstring:   Shell in the memory image

So self is the object we can use to inspect the memory image. We can inspect the current process as the _proc attribute (which is going to be init):

In [4]: self._proc
Out[4]: [task_struct task_struct] @ 0xFFFF91193BD716C0

In [5]: self._proc?
Type:           task_struct
String form:    18446622136297002688
File:           /Volumes/Cobalt/365-and-590F/volatility-lab/volatility/volatility/plugins/overlays/linux/linux.py
Docstring:      <no docstring>
Init docstring:
This must be instantiated with a dict of members. The keys
are the offsets, the values are Curried Object classes that
will be instantiated when accessed.

It’s a task_struct. What’s that? To understand, we need to talk a bit about how a kernel manages processes – if we understand that in general, we can understand this one in particular. You have to store process information, and allocate memory for the process. The kernel has to track this per-process.

This is stored in a C structure. On Linux, it’s called a task_struct; on most versions of Windows, an EPROCESS, on OS X it’s a proc structure. structs are just a way to group data together and organize it.

We’ll do this in the context of Linux, but the basic idea is the same on all modern OSes.

What does the struct contain? Things like the pid, the “short name” (comm, pid how the process identifies itself to others), when it was started. Usually also a way to access the process tree (parents, siblings, children), who owns the process, open files/sockets, and memory mangement info. In Linux, it’s in the mm_struct.

Let’s ask volatility more about this process:

In [7]: self._proc.pid
Out[7]:  [int]: 1

In [8]: self._proc.comm
Out[8]: 'systemd'

systemd? What? That’s not init! Actually, it is. Some new versions of Linux now use a new daemon for process management called systemd, though various tools will still refer to it as init.

OK, so it does seem like we have the right thing. We can continue to manually poke around on this object (demo auto complete) but volatility has many built-in commands to do things for us automatically. Let’s use them to ask what processes are running, in hopes of finding a python process.

python2.7  vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_pslist
Volatility Foundation Volatility Framework 2.6.1
Offset             Name                 Pid             PPid            Uid             Gid    DTB                Start Time
------------------ -------------------- --------------- --------------- --------------- ------ ------------------ ----------
0xffff91193bd716c0 systemd              1               0               0               0      0x000000003b29e000 2019-04-17 20:37:49 UTC+0000
0xffff91193bd72d80 kthreadd             2               0               0               0      ------------------ 2019-04-17 20:37:49 UTC+0000
0xffff91193bd75b00 kworker/0:0          3               2               0               0      ------------------ 2019-04-17 20:37:49 UTC+0000
0xffff91193bd74440 kworker/0:0H         4               2               0               0      ------------------ 2019-04-17 20:37:49 UTC+0000
0xffff91193bdb0000 mm_percpu_wq         6               2               0               0      ------------------ 2019-04-17 20:37:49 UTC+0000
0xffff91193bdb16c0 ksoftirqd/0          7               2               0               0      ------------------ 2019-04-17 20:37:49 UTC+0000
0xffff91193bdb2d80 rcu_sched            8               2               0               0      ------------------ 2019-04-17 20:37:49 UTC+0000
0xffff91193bdb5b00 rcu_bh               9               2               0               0      ------------------ 2019-04-17 20:37:49 UTC+0000
...

We only see short names here.

We can also ask in a form that might be more familiar to you if you command-line system administration:

python2.7  vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_psaux
Volatility Foundation Volatility Framework 2.6.1
Pid    Uid    Gid    Arguments                                                       
1      0      0      /sbin/init maybe-ubiquity                                       
2      0      0      [kthreadd]                                                      
3      0      0      [kworker/0:0]                                                   
4      0      0      [kworker/0:0H]                                                  
6      0      0      [mm_percpu_wq]                                                  
7      0      0      [ksoftirqd/0]                                                   
8      0      0      [rcu_sched]                                                     
9      0      0      [rcu_bh]                                                        
10     0      0      [migration/0]                          

This is roughly equivalent to ps aux which shows us the full command-line arguments of each process. Any python?

python2.7  vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_psaux | grep python

Volatility Foundation Volatility Framework 2.6.1
900    0      0      /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
951    0      0      /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal

OK, there’s a couple. But, let’s say we know those are “honest”. Maybe there’s another hiding. It turns out a process can tell the kernel to change its name, though (this is like baby’s first rootkit stuff).

Maybe we can use another part of the task_struct to find a python process. The kernel knows about all open files/sockets – we can check that, too.

python2.7  vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_lsof
Volatility Foundation Volatility Framework 2.6.1
Offset             Name                           Pid      FD       Path
------------------ ------------------------------ -------- -------- ----
0xffff91193bd716c0 systemd                               1        0 /dev/null
0xffff91193bd716c0 systemd                               1        1 /dev/null
0xffff91193bd716c0 systemd                               1        2 /dev/null
0xffff91193bd716c0 systemd                               1        3 /dev/kmsg
0xffff91193bd716c0 systemd                               1        4 anon_inode:[10710]
0xffff91193bd716c0 systemd                               1        5 anon_inode:[10710]
0xffff91193bd716c0 systemd                               1        6 anon_inode:[10710]

Remember this only shows access at the moment of the memory dump! Unfortunately processes don’t typically open their own executable (the kernel loads it, but that’s different) so no help here.

(Similarly, we can check which processes are listening to which sockets, but that’s probably not useful to us right now.)

python2.7  vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_netstat
Volatility Foundation Volatility Framework 2.6.1
UNIX 28516              systemd/1     
UNIX 20562              systemd/1     /run/systemd/private
UNIX 28817              systemd/1     /run/systemd/journal/stdout
UNIX 20559              systemd/1     /run/systemd/notify
UNIX 20560              systemd/1     
UNIX 40611              systemd/1     /run/systemd/journal/stdout
UNIX 29511              systemd/1     /run/systemd/journal/stdout
UNIX 20561              systemd/1     
UNIX 26382              systemd/1     
UNIX 22850              systemd/1     

And same deal: only things active at the time of dump show up here – it’s not historical.

So let’s try something else, and see what the mm_struct can tell us.

mm_struct describes the layout of the virtual memory for a process. Recall from your undergrad systems class that generally virtual memory goes from address 0 to the maximum possible address (in a 64 bit system, usually 0xFFFFFFFFFFFFFFFF). For historic reasons we draw it top down, starting at the high address. The mm_struct_ keeps track of various regions, including in roughly this order:

0xFFFFFFFFFFFFFFFF

environment variables
command line args
stack (stack allocated variables, like C fn params)
  |
  v

anonymous / file mappings (where python C extensions get loaded)
  |
  v

  ^
  |
heap (called brk) (dynamic variables, like everything in python)

data (static strings, etc., like builtin docstrings)
executable code (actual executable code)

0x0000000000000000

Parts of the address space are groups into “vm areas” – each is granted the same permissions, and each is mapped to the same file. For example, env variables, command line args, and the stack are often all part of the same vm area (maybe makes sense, since env variables and args are kinda like fn arguments…) Each mapped file is its own vm area; the heap gets its own area; data is often split into two parts, and the executable is a single area.

So the kernel keeps track of which vm area belongs to which process. Volatility can get us this information via proc_maps:

Volatility Foundation Volatility Framework 2.6.1
Offset             Pid      Name                 Start              End                Flags               Pgoff Major  Minor  Inode      File Path
------------------ -------- -------------------- ------------------ ------------------ ------ ------------------ ------ ------ ---------- ---------
0xffff91193bd716c0        1 systemd              0x000055c8c0098000 0x000055c8c01e6000 r-x                   0x0      8      2     393586 /lib/systemd/systemd
0xffff91193bd716c0        1 systemd              0x000055c8c03e5000 0x000055c8c041d000 r--              0x14d000      8      2     393586 /lib/systemd/systemd
0xffff91193bd716c0        1 systemd              0x000055c8c041d000 0x000055c8c041e000 rw-              0x185000      8      2     393586 /lib/systemd/systemd 
0xffff91193bd716c0        1 systemd              0x000055c8c06e8000 0x000055c8c087b000 rw-                   0x0      0      0          0 [heap]
0xffff91193bd716c0        1 systemd              0x00007fc2c1abe000 0x00007fc2c1c5b000 r-x                   0x0      8      2     394048 /lib/x86_64-linux-gnu/libm-2.27.so

Note that on a running Linux system you can see this same information directly (as you can many things that Volatility finds) in the /proc filesystem. This linves in /proc/<PID>/maps.

Note each area has a start/end address, a permissions flags (what can be done here), and a name, which might be the name of the file that’s mapped, or heap or stack or whatnot. Just looking above, the first area is the code segment (executable), the next two are data areas (read, read/write). If you look further down in full output, you’d see the heap, mapped files, loaded shared libraries, the the stack, and so on.

So we think there’s a running process that’s not using the python shortname but maybe is actually using the python executable. Let’s go ask the kernel at the volatility shell:

for task in self._proc.tasks: # the list of running processes
    code_area = [vma for vma in task.get_proc_maps()
                 if is_code_area(task, vma)]
    if code_area:
        exec_name = code_area[0].vm_name(task)
        print task.pid, task.comm, exec_name

For each process, we find the VM area that corresponds with the executable part of the process binary – the code segment. We find the one whose start/end addresses match the code segment:

def is_code_area(task, vma):
    return (task.mm.start_code >= vma.vm_start and task.mm.end_code <= vma.vm_end)

Now let’s run this on the image, and check the claimed name against the mapped executable:

...
963 login /bin/login
1271 systemd /lib/systemd/systemd
1272 (sd-pam) /lib/systemd/systemd
1291 bash /bin/bash
1368 snapd /usr/lib/snapd/snapd
8196 vmhgfs-fuse /usr/bin/vmhgfs-fuse
8950 agetty /sbin/agetty
8951 agetty /sbin/agetty
9260 bash /usr/bin/python2.7

Hmm, two bash processes, once of which is actually python3.6! Very good, I think we’ve found it. (As an aside, why might a process do this? Some programs do it to display useful information to the sysadmin, for example postgres shows running query, sshd shows name of logged-in user, etc.)

Let’s look into this process further. We can interrogate just this process, for example:

python2.7  vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_lsof -p 9260
Volatility Foundation Volatility Framework 2.6.1
Offset             Name                           Pid      FD       Path
------------------ ------------------------------ -------- -------- ----
0xffff911935415b00 bash                               9260        0 /dev/tty1
0xffff911935415b00 bash                               9260        1 /dev/tty1
0xffff911935415b00 bash                               9260        2 /dev/tty1

Looks like an open interactive process (FD 0, 1, 2 are standard in out and err).

You could also look for open sockets, etc., in a process-specific way. Let’s look at the procmap again:

python2.7  vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_proc_maps -p 9260
Volatility Foundation Volatility Framework 2.6.1
Offset             Pid      Name                 Start              End                Flags               Pgoff Major  Minor  Inode      File Path
------------------ -------- -------------------- ------------------ ------------------ ------ ------------------ ------ ------ ---------- ---------
0xffff911935415b00     9260 bash                 0x00005574ee4af000 0x00005574ee7b7000 r-x                   0x0      8      2     146415 /usr/bin/python2.7
0xffff911935415b00     9260 bash                 0x00005574ee9b7000 0x00005574ee9b9000 r--              0x308000      8      2     146415 /usr/bin/python2.7
0xffff911935415b00     9260 bash                 0x00005574ee9b9000 0x00005574eea2f000 rw-              0x30a000      8      2     146415 /usr/bin/python2.7
0xffff911935415b00     9260 bash                 0x00005574eea2f000 0x00005574eea53000 rw-                   0x0      0      0          0 
0xffff911935415b00     9260 bash                 0x00005574f0196000 0x00005574f02a2000 rw-                   0x0      0      0          0 [heap]
0xffff911935415b00     9260 bash                 0x00007f3ec17ee000 0x00007f3ec17f1000 r-x                   0x0      8      2     146645 /usr/lib/python2.7/dist-packages/setproctitle.x86_64-linux-gnu.so
0xffff911935415b00     9260 bash                 0x00007f3ec17f1000 0x00007f3ec19f0000 ---                0x3000      8      2     146645 /usr/lib/python2.7/dist-packages/setproctitle.x86_64-linux-gnu.so
0xffff911935415b00     9260 bash                 0x00007f3ec19f0000 0x00007f3ec19f1000 r--                0x2000      8      2     146645 /usr/lib/python2.7/dist-packages/setproctitle.x86_64-linux-gnu.so
0xffff911935415b00     9260 bash                 0x00007f3ec19f1000 0x00007f3ec19f2000 rw-                0x3000      8      2     146645 /usr/lib/python2.7/dist-packages/setproctitle.x86_64-linux-gnu.so

Among other things, setproctitle has been mapped. This is a library that lets Python set its process name and arguments. Mystery solved? https://github.com/dvarrazzo/py-setproctitle

We still maybe want to look into the process itself. Can we examine its stack and heap? We have their addresses, so we can just find their start/end addresses, seek to them in the memory image, and look at them, right?

0xffff911935415b00     9260 bash                 0x00005574f0196000 0x00005574f02a2000 rw-                   0x0      0      0          0 [heap]
0xffff911935415b00     9260 bash                 0x00007ffc9446d000 0x00007ffc9448e000 rw-                   0x0      0      0          0 [stack]

Look at that stack address! That’s like 131,000 GB into address space! But our memory image is 1GB. What’s going on? Virtual memory, that’s what. Process memory is reported at virtual addresses, not physical addresses (offsets into RAM).

Processes sees its own address space as one large block shared with no-one. But in practice, process memory is sliced up by the OS and spread around in physical memory in units called “pages” – the MMU in your CPU help the OS do this efficiently. Only actually-used memory is mapped – lots of the virtual address space is not used.

(draw diagram on board, see also like any OS textbook ever’s chapter on virtual memory)

The MMU translates the virtual address to the physical address. There’s various bits of magic to make it better: the TLB caches lookups to speed things up; swapped out pages get swapped back in, etc., all transparent to the process itself. Volatility can do this for us – how does it work?

Some parts are easy. The kernel is identity mapped. That is, the code / data regions are not scattered – they’re layed out in order in the physical space. Their location in virtual space is just their location in physical space, plus a constant offset. On 64 bit systems:

virtual = physical + 0xffffffff80000000 + ASLR if enabled

ASLR is a way to randomize the location of memory; this helps mitigate buffer overflows.

But what about non-identity-mapped pages? How do we know where they live?

The virtual address is “translated” to a physical address, but how?

We split the 64 bit virtual address into pieces. (32 bit systems work similarly though details differ).

The first (high-order) 16 bits, then four groups of nine, then one group of 12.

The first 16 bits are irrelevant. The rest are a hierarchical set of addresses.

The low-order 12 bits are the most specific – they identify a byte in a page (4K pages needs 12 bytes). This is like a dorm room number.

Which page? The next 9 bits are an entry into the “page table”, that keeps track of which page we’re referring to. Kinda like “which dorm”.

But in fact, a process has many page tables. They’re organized by a page directory, an entry in which the next 9 bits store. Like a housing area, I guess?

Anyway, this happens two more times. There’s a page directory pointer table, and a page map. The highest level is a table of table of tables of tables. In linux, this root tree is called the “directory table” with a root of the “directory table base” (DTB). Or at least, that’s what volatility calls it. Linux calls it the Page Global Directory (PGB) addresses. Its address is stored in the process’s mm_struct. The non-code, non-data parts of the kernel, which are not identity mapped, also have their own DTB, which you may recall Volatility showed us earlier.

Anyway, Volatility can parse the DTB on down, and pull the pages corresponding to a particular virtual memory area out for us. We just need to tell it which process and the start address:

python2.7 vol.py --plugins=./volatility/plugins/ --profile=LinuxUbuntu1804serverx64 --filename=Ubuntu\ 64-bit\ server-d83c7883.vmem linux_dump_map -p 9260 -s 0x00007ffc9446d000 -D .
Volatility Foundation Volatility Framework 2.6.1
Task       VM Start           VM End                         Length Path
---------- ------------------ ------------------ ------------------ ----
      9260 0x00007ffc9446d000 0x00007ffc9448e000            0x21000 ./task.9260.0x7ffc9446d000.vma

It drops them into a single contiguous binary file, named with .dma. We can now examine this in the usual way, like strings, for example.

Lots of false positives, but notice that the command line, shortname, and proctitle are here:

...
bash
/usr/bin/python

We can do the same thing with the heap. strings and so on. But wouldn’t it be nice if we could actually understand the data more than just via strings? Sure would. To do so, we need to understand how Python objects get allocated to the heap. Long story short, in CPython anyway, they are C data structures – complex data structures with typed members.

The thing we really care about, though is that C structs are a packing format for bytes. Which fields are how many bytes long, and their order. If you can accurately compute this for a given struct (which, spoiler, you can – that’s exactly what the Python struct module lets you do, which you know if you’ve made it through 365) you can parse memory structures.

Suppose you had a point struct:

struct point {
    int x;
    int y
}

How do you interpret it for an arbitrary region of memory? Like, if I give you 12 bytes, where’s the point? (on board)

You have to know something else otherwise you could just arbitrarily map a struct onto any part of memory and parse it – bits are bits. But often you do know something else. For example, if you knew the point had to be inside a circle of radius 4, that would rule out many possible values.

Similarly, the fields of a particular Python object are constrained. The details depend upon your version of Python. For example, in Python 2 you have PyStringObjects with fields that look kinda-sorta like:

struct PyStringObject {
    long ob_refcnt; // ref count
    void *ob_type; // address of type object
    long ob_size; // string length
    long ob_shash; // hash
    int ob_sstate; // intern status 0 / 1 / 2
    char ob_sval[1]; // first char
};
// rest of string + null follows immediately!

(This is not exactly what it looks like, see https://github.com/python/cpython/blob/2.7/Include/stringobject.h for details.)

State is an enumeration with only three possible values. char is first char. Volatility needs a translation into a “vtype”, which looks something like this (again, some details elided):

 37,
        {
            'ob_refcnt': [0, ['long long']],  # Py_ssize_t = ssize_t
            'ob_type': [8, ['pointer', ['void']]],  # struct _typeobject *
            'ob_size': [16, ['long long']],  # Py_ssize_t = ssize_t
            'ob_shash': [24, ['long long']],
            'ob_sstate': [32, ['unsigned int']],
            'ob_sval': [36, ['array', 1, ['char']]]
        }]

37 bytes in total, and offsets to each thing in there.

Then we tell volatility how to validate the object, which is gonna look something like (again, simplified) this:

def is_value(self):
    sval_offset, _ = self.members['ob_sval']
    string = self.obj.vm_zread(self.obj_offset + sval_offset, self.ob_size + 1)
    return all([
        self.ob_type.is_valid(), # built-in pointer validity check
        self.ob_size > 0 and self.ob_size <= 1e6,
        self.ob_sstate.v() in (0, 1, 2),
        string[-1] == '\x00',
        self.ob_shash in (-1, hash(string))
    ])

Is the pointer valid? Is the string valid (non-negative size, and not too big)? Is the state correct? Is the string null-terminated? and, is the hash -1 (not yet computed) or the python hash of the string?

Then you can tell Volatility to search the whole heap for valid string objects. A brute-force search would look something like:

for offset in xrange(heap_vma.vm_start, heap_vma.vm_end):
    found_object = obj.Object("PyStringObject", offset=offset, vm=addr_space)
    if found_object.is_valid():
    print get_string(found_object)
    print '\n'

This tests every possible sequence. When you execute the above code (well, the real code, not the simplified version above), you extract all strings from the Python heap. And not just live strings, but strings that haven’t been garbage collected yet, too!

Could we just run strings? Sure. But lots of reasons to parse. String boundaries are lost with strings. Python strings might also hold bytes. And Python3 might use Unicode, so strings might not do what we want. Volatility can also be smarter about parsing, and need not do brute force if you can tell it more about the memory layout. Similarly, you could write parsers for other Python datatypes (lists, dicts, etc.) if you wanted to extract more meaningful information about the python process.