The Hoard Memory Allocator

Emery Berger

University of Massachusetts Amherst

<emery@cs.umass.edu>

2004-12-08

Revision History
Revision 1.1	2004-12-08	EDB
Improved formatting
Revision 1.0	2004-12-06	EDB
First draft

Abstract

Documentation for the Hoard scalable memory allocator, including build and usage directions for several platforms.

Table of Contents

1. Introduction

1.1. Why Hoard?
1.2. How Do I Use Hoard?
1.3. Who's Using Hoard?

2. Building Hoard

2.1. Platform-specific directions

3. Using Hoard

3.1. UNIX
3.2. Windows

4. Frequently Asked Questions

5. More Information

6. License Information

hoard: To amass and put away (anything valuable) for preservation, security, or future use; to treasure up: esp. money or wealth.
Oxford English Dictionary

1. Introduction

The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocator for shared-memory multiprocessors. It runs on a variety of platforms, including Linux, Solaris, and Windows.

1.1. Why Hoard?

1.1.1. Contention

Multithreaded programs often do not scale because the heap is a bottleneck. When multiple threads simultaneously allocate or deallocate memory from the allocator, the allocator will serialize them. Programs making intensive use of the allocator actually slow down as the number of processors increases. Your program may be allocation-intensive without you realizing it, for instance, if your program makes many calls to the C++ Standard Template Library (STL).

1.1.2. False Sharing

The allocator can cause other problems for multithreaded code. It can lead to false sharing in your application: threads on different CPUs can end up with memory in the same cache line, or chunk of memory. Accessing these falsely-shared cache lines is hundreds of times slower than accessing unshared cache lines.

1.1.3. Blowup

Multithreaded programs can also lead the allocator to blowup memory consumption. This effect can multiply the amount of memory needed to run your application by the number of CPUs on your machine: four CPUs could mean that you need four times as much memory. Hoard is a fast allocator that solves all of these problems.

1.2. How Do I Use Hoard?

Hoard is a drop-in replacement for malloc(), etc. In general, you just link it in or set just one environment variable. You do not have to change your source code in any way. See the section "Windows Builds" below for more information for particular platforms.

1.3. Who's Using Hoard?

Companies using Hoard in their products and servers include AOL, British Telecom, Business Objects (formerly Crystal Decisions), Entrust, Novell, OpenWave Systems (for their Typhoon and Twister servers), and Reuters.

Open source projects using Hoard include the Bayonne GNU telephony server, the Cilk parallel programming language, the Dartmouth Scalable Simulation Framework, and the GNU Common C++ system.

2. Building Hoard

You can use the available pre-built binaries or build Hoard yourself. Hoard is written to work on Windows and any variant of UNIX that supports threads, and should compile out of the box. Rather than using Makefiles or configure scripts, Hoard includes custom scripts that all start with the prefix compile.

2.1. Platform-specific directions

2.1.1. Linux and Solaris Builds

You can compile Hoard out of the box for Linux and Solaris using the GNU compilers (g++) just by running the compile script:

./compile

2.1.2. Windows Builds

There are now three alternative ways of using Hoard with Windows.

The first approach builds a DLL, libhoard.dll and its associated library libhoard.lib.
```
.\compile-dll
```
The second approach relies on Microsoft Research's Detours. With Detours, you can take advantage of Hoard without having to relink your applications. Install Detours into C:\detours, and then build the Hoard detours library:
```
.\compile-detours 
```
The third approach generates winhoard, which replaces malloc/new calls in your program any DLLs it might use.
```
.\compile-winhoard
```

3. Using Hoard

3.1. UNIX

In UNIX, you can use the LD_PRELOAD variable to use Hoard instead of the system allocator for any program not linked with the "static option" (that's most programs). Below are settings for Linux and Solaris.

3.1.1. Linux

LD_PRELOAD="/path/libhoard.so:/usr/lib/libdl.so"

3.1.2. Solaris

Depending on whether you are using the GNU-compiled version (as produced by compile) or the Sun Workshop-compiled versions (produced by compile-sunw), your settings will be slightly different.

Version	Setting
GNU-compiled	LD_PRELOAD="/path/libhoard.so:/usr/lib/libdl.so"
Sun-compiled (32-bits)	LD_PRELOAD="/path/libhoard_32.so:/usr/lib/libCrun.so.1:/usr/lib/libdl.so"
Sun-compiled (64-bits)	LD_PRELOAD="/path/libhoard_64.so:/usr/lib/64/libCrun.so.1:/usr/lib/64/libdl.so"

Note

For some security-sensitive applications, Solaris requires you place libraries used in LD_PRELOAD into the /usr/lib/secure directory. In that event, after copying these libraries into /usr/lib/secure, set LD_PRELOAD by omitting the absolute locations of the libraries, as follows:

LD_PRELOAD="libhoard.so:libCrun.so.1:libdl.so"

3.2. Windows

There are three ways to use Hoard on Windows.

Using Detours
By using Detours, you can take advantage of Hoard's benefits without relinking your Windows application (as long as it is dynamically linked to the C runtime libraries). You will need to use one of the two included Detours tools (setdll.exe or withdll.exe in the detours/ directory) in conjunction with this version of Hoard. To temporarily use Hoard as the allocator for a given application, use withdll:
```
withdll -d:hoarddetours.dll myprogram.exe
```
If you want your program to use Hoard without having to invoke withdll every time, you can use setdll to add it to your executable:
```
setdll -d:hoarddetours.dll myprogram.exe myprogram.exe
```
You can later remove Hoard from your executable as follows:
```
setdll -r:hoarddetours.dll myprogram.exe
```
Using winhoard
Another method is to use winhoard. Winhoard, like Detours, replaces malloc/new calls from your program and any DLLs it might use (leaving HeapAlloc calls intact). One advantage is that it does not require Detours to do this. To use the Winhoard version, link your executable with usewinhoard.obj and winhoard.lib, and then use winhoard.dll:
```
cl /Ox /MD myprogram.cpp usewinhoard.obj winhoard.lib
```
Using libhoard
The last method is to link directly with the libhoard DLL. This approach is simple, but only suitable for small applications, since it will not affect malloc calls in any other DLL you might load. To use this option, you should put the following into your source code as the very first lines:
```
#if defined(USE_HOARD)
#pragma comment(lib, "libhoard.lib") 
#endif 
```
This stanza should be in the first part of a header file included by all of your code. It ensures that Hoard loads before any other library (you will need libhoard.lib in your path). When you execute your program, as long as libhoard.dll is in your path, your program will run with Hoard instead of the system allocator. Note that you must compile your program with the /MD flag, as in:
```
cl /MD /G6 /Ox /DUSE_HOARD=1 myprogram.cpp 
```
Hoard will not work if you use another switch (like /MT) to compile your program.

4. Frequently Asked Questions

4.1. What kind of applications will Hoard speed up?
4.2. I'm using the STL but not seeing any performance improvement. Why not?
4.3. Have you compared Hoard against mtmalloc or libumem?
4.4. What systems does Hoard work on?
4.5. Have you compared Hoard with SmartHeap SMP?

4.1.

What kind of applications will Hoard speed up?

Hoard will always improve the performance of multithreaded programs running on multiprocessors that make frequent use of the heap (calls to malloc/free or new/delete, as well as many STL functions). Because Hoard avoids false sharing, Hoard also speeds up programs that only occasionally call heap functions but access these objects frequently.

4.2.

I'm using the STL but not seeing any performance improvement. Why not?

In order to benefit from Hoard, you have to tell STL to use malloc instead of its internal custom memory allocator, as in:

typedef list<unsigned int, malloc_alloc> mylist;

4.3.

Have you compared Hoard against mtmalloc or libumem?

Yes. Hoard is much faster than either. For example, here's an execution of threadtest on Solaris:

Default:	4.60 seconds
Libmtmalloc:	6.23 seconds
Libumem:	5.47 seconds
Hoard 3.2:	1.99 seconds

4.4.

What systems does Hoard work on?

Hoard has been successfully tested on numerous Windows, Linux and Solaris systems, including a 4-processor x86 box running Windows NT/2000, a 4-processor x86 box running RedHat Linux 6.0 and 6.1, and a 16-processor Sun Enterprise server running Solaris.

4.5.

Have you compared Hoard with SmartHeap SMP?

We tried SmartHeap SMP but it did not work on our Suns (due to an apparent race condition in the code).

5. More Information

The first place to look for Hoard-related information is at the Hoard web page, www.hoard.org.

There are two mailing lists you should consider being on if you are a user of Hoard. If you are just interested in being informed of new releases, join the Hoard-Announce list. For general Hoard discussion, join the Hoard mailing list. You can also search the archives of these lists.

6. License Information

The use and distribution of Hoard is governed by the GNU General Public License as published by the Free Software Foundation: see the included file COPYING for more details.

Because of the restrictions imposed by this license, most commercial users of Hoard have purchased commercial licenses through the University of Texas at Austin. You may contact Emery Berger for more information.