Secure Memory Allocators to Defend Memory Safety In Depth

Author: Zheng Luo
Last update: 2024-08-31

Background

When programming in native languages with manual memory management, users in my company often get troubled by various memory issues, including use-after-free, double-free, read-from-uninitialized data, or memory leaks, despite being very experienced developers.

Fortunately, with the entire industry's decades of experience combating these memory issues, we already have a full toolkit tackling memory contract violations. These tools are extensively practiced everywhere I've worked and played an important role in code quality:

Static Check: clang-tidy for single-file checks. CodeChecker for cross-file checks. They only catch part of the memory issues and have a moderate level of false positives. Users typically need to set up enforced checks to make their codebase compliant.
Sanitizers including ASAN, MSAN, UBSAN and valgrind: From my personal experience, they typically slow down programs by 2x-10x depending on memory access patterns, and they can catch use-after-free, double-free, or memory leaks reliably. My current company runs all unit tests under sanitizers.
Compilation flag hardening: OpenSSF has a pretty good guide on this topic. With these flags, std::vector subscription is checked, stack overflow is partially detected, and bad malloc_chk calls could be caught. My current company plans to enable it on all production binaries that aren't too performance-critical. These flags typically cause a 5%-20% slowdown in production.
Runtime glibc malloc tunables: This is a less-known feature, and I recommend you read their manual. In short, when we set GLIBC_TUNABLES=glibc.malloc.perturb=125 when running the program, glibc will initialize all malloc-ed memory blocks before returning them to users. This often helps users catch memory issues earlier, as a non-zero out-of-range value is easier to notice than random (largely zero) bytes from the heap. Depending on the malloc frequency, this could cause a 2%-20% slowdown.

Secure Allocators

Recently, I found another layer of defense that could supplement the above guards against memory issues - secure memory allocators. It's a set of memory allocator implementations aiming to replace the native (usually glibc)'s malloc, free, and other primitives, by providing better detection of memory issues and protecting against malicious attackers on heap layout. Moreover, it often comes with minimal (2% - 15%) slowdown, rendering it useful for production workloads.

Secure memory allocators are not a new concept. You may already be using them on your devices. Since Android 11, Scudo, a specially-designed memory allocator focusing on safety, has been enabled by default for all native code. Chromium also has its own allocator with safety in mind - PartitionedAlloc provides more security than the system-provided allocators.

What kind of protection does it provide? Let's dive into one of my favorite choices in this domain, GrapheneOS's hardened_malloc, to figure it out. The usage is simple:

./preload.sh krita --new-image RGBA,U8,500,500

Internally, this small script just preloads libhardened_malloc.so to the given program and replaces malloc, free, and other allocation primitives (e.g., posix_memalign, C++'s allocate, or memcpy for extra checks):

LD_PRELOAD=dir/libhardened_malloc.so $@

Simple Example

To test it, let's create a heap buffer overflow:

#include <stdlib.h>
#include <string.h>

int main(int argc, char* argv[]) {
    (void)argc;
    void* buf = malloc(16);
    memcpy(buf, argv[1], strlen(argv[1]));
    free(buf);
    return 0;
}

The program overflows if the first argument exceeds 16 bytes. valgrind reveals the issue:

$ valgrind -- ./test aaaaaaaaaaaaaaaaaaaaaaaaaaaa
==49309== Invalid write of size 2
==49309==    at 0x4852403: memmove (vg_replace_strmem.c:1414)
==49309==    by 0x1091A5: main (in /mnt/data/code/git/hardened_malloc/out/test)
==49309==  Address 0x4a7c050 is 0 bytes after a block of size 16 alloc'd
==49309==    at 0x48447A8: malloc (vg_replace_malloc.c:446)
==49309==    by 0x109171: main (in /mnt/data/code/git/hardened_malloc/out/test)
==49309==

However, if you just run it directly, it will not trigger any alert and exit cleanly. This is a very serious security concern, as attackers could exploit this overflow to overwrite internal program states and then gain control of the program's control flow. There's plenty of research exploiting heap layouts (Heap FengShui if you haven't heard of its cool name).

Hardened malloc detects this memory issue pretty cleanly, preventing future escalation of this issue:

LD_PRELOAD=./libhardened_malloc.so ./test aaaaaaaaaaaaaaaaaaaaaaaaaaaa 
fatal allocator error: canary corrupted
[1]    51707 IOT instruction (core dumped)  LD_PRELOAD=./libhardened_malloc.so ./test aaaaaaaaaaaaaaaaaaaaaaaaaaaa

Judging from the error messages, we can sort of guess the underlying methodology: it places a canary value at the end of the allocation and checks its value at deallocation. When the value mismatches, it will ruthlessly abort() the program, assuming that it has entered an unrecoverable state.

UAF Example

Let's try another use-after-free example:

#include <stdlib.h>
#include <string.h>

int main(void) {
    void* buf;
    buf = malloc(sizeof(char[20]));
    strcpy(buf, "Hello, world!");
    free(buf);
    strcpy(buf, "Goodbye, world!"); // WAF
    return 0;
}

Unfortunately, at this time, hardened_malloc didn't detect it, and the program exits normally. It's not ideal, as valgrind illustrates the problem:

==69301== Invalid write of size 8
==69301==    at 0x1091A2: main (test2.c:9)
==69301==  Address 0x4a7c040 is 0 bytes inside a block of size 20 free'd
==69301==    at 0x48478EF: free (vg_replace_malloc.c:989)
==69301==    by 0x109189: main (test2.c:8)
==69301==  Block was alloc'd at
==69301==    at 0x48447A8: malloc (vg_replace_malloc.c:446)
==69301==    by 0x10915A: main (test2.c:6)
==69301== 
==69301== Invalid write of size 8
==69301==    at 0x1091A5: main (test2.c:9)
==69301==  Address 0x4a7c048 is 8 bytes inside a block of size 20 free'd
==69301==    at 0x48478EF: free (vg_replace_malloc.c:989)
==69301==    by 0x109189: main (test2.c:8)
==69301==  Block was alloc'd at
==69301==    at 0x48447A8: malloc (vg_replace_malloc.c:446)
==69301==    by 0x10915A: main (test2.c:6)

It's not doing nothing in this case. First of all, it randomizes the number of slots to reuse for small memory allocations so that the overwritten memory chunk will not easily allow attackers to control the data flow deterministically. Secondly, hardened_alloc's UAF check only happens when a chunk is re-allocated, so it will complain correctly if we change the program to the below:

int main(void) {
    void* buf;
    buf = malloc(sizeof(char[20]));
    strcpy(buf, "Hello, world!");
    free(buf);
    strcpy(buf, "Goodbye, world!");
    for (int i=0; i<10000; ++i) {
        buf = malloc(sizeof(char[20]));
        strcpy(buf, "Hello, world!");
        free(buf);
    }
    return 0;
}

LD_PRELOAD=./libhardened_malloc.so ./test2 
fatal allocator error: detected write after free

This detection isn't always triggered, as there's a chance that hardened_malloc hasn't inspected the UAF-ed chunk in allocation requests. This example demonstrated limitations of secure allocators' capability in guarding memory misuses. Therefore, in a test environment, you should still resolve to sanitizers if performance isn't too big of an issue.

The above failure doesn't indicate that none of the secure allocators could detect this memory corruption. Hardened malloc deliberately chose a design to minimize performance impact and guarantee heap security over detecting memory issues. Let's take a look into another secure allocator, Microsoft's hardened snmalloc. It correctly and reliably catches the UAF in the original program example:

Heap corruption - free list corrupted!
[backtrace redacted]

[1]    43348 IOT instruction (core dumped)  LD_PRELOAD=../../snmalloc/build/libsnmallocshim-checks.so ./test2 123

Internally, our write-after-free corrupted the free list of snmalloc, and on program exit, snmalloc iterates the free list and verifies that the backward edge of the free list matches the internally encoded version. Since our example corrupted the free list, the edge mismatches and triggered this alert:

140	         /**
141	          * Check the signature of this free Object
142	          */
143	         void check_prev(address_t signed_prev)
144	         {
145	           snmalloc_check_client(
146	             mitigations(freelist_backward_edge),
147	             signed_prev == prev_encoded,
148	             "Heap corruption - free list corrupted!");
149	           UNUSED(signed_prev);
---
(gdb) p signed_prev
$4 = 16650107089926554004
(gdb) p prev_encoded
$5 = 9399091170604832
(gdb) p  (char[])prev_encoded
$11 = " world!"

Comparisons and Adoption

The above examples illustrated the difference in capabilities of secure allocators. On a high level, I found ISOAlloc's feature matrix a good resource to compare different secure allocators' feature sets. It's worth noting that even if both secure allocators support a feature (e.g., canary after heap allocation), the capability to detect memory corruption is still subject to the allocator's implementation (e.g., the size of the canary, the canary value, and when it checks the value). Overall, I found hardened_malloc and hardened snmalloc the top two competitors in this domain.

Most secure allocators are designed to be used in production with minimal performance overhead, and I highly recommend people enable it by default for their own deployment. There are some tricky pieces I found prohibiting enabling it by default at the OS level that might be worth noting:

They might reveal previously-concealed memory issues due to different free order and timing. Moreover, many secure allocators randomize the bucket order, making it harder to debug Heisenbugs.
They might have different behavior on zero-sized allocation. According to the C Standard, returning NULL and a valid pointer are both valid behaviors, yet many libraries might rely on one of the two.
In certain circumstances, the performance issue might be amplified by a specific use pattern and create >20% slowdown.
'Release' builds of these memory allocators usually provide minimal context on crashes for better benchmark numbers.
Supported OS and ISA are usually quite limited. I saw good coverage on Linux + {x86, ARM64}, but other platforms largely remain untested.

Personally, I recommend the below setup, assuming that you can control both lab and production environments:

Use sanitizers for unit tests and quick lab tests for their better coverage and debug support.
Use a secure allocator build with full debug logs and -fno-omit-frame-pointers in the lab environment so that it's easier to reproduce the issue with acceptable slowdown.
After it has been tested in the lab environment, run your program with a 'Release' build of secure allocators in production.

Future Improvements

Secure allocators are still in their early stages and have a lot of potential. In the upcoming years, we can expect better support from OS and hardware to help them catch memory issues more precisely:

userfaultfd after Linux 5.7 allows users to get notified when people attempt to write to a page. This can be used to detect unauthorized writes at userspace.
ARM v8.5 provides Memory Tagging Extension that attaches a 4-bit tag to a memory zone (16 bytes) and pointers in it. This tag will be verified by the CPU at no additional cost to ensure pointers point to their designated zones.
Both Intel and AMD support ignoring the top byte in pointers, namely Linear Address Masking and upper address ignore, respectively. Unfortunately, the hardware provides no support to automatically check MTE in pointers. Users can still use the feature to check the match themselves. ISOAlloc has a pretty interesting article on how to implement it
Since Skylake, Intel CPUs support Memory Protection Keys extension. It allows users to attach a key to pages and then use this key to change the page protection level from userspace. This allows allocators to protect their metadata structs using a key and only allows itself to access them. This avoids potential malicious attackers from manipulating heap metadata structs and gaining access to the control flow.

There's even more progress in this domain on the horizon. Academia has been working on CHERI, a set of instructions to explicitly tag memory access patterns. Surely we will see more secure, reliable, and easy-to-use memory corruption detection in the foreseeable future.