Copyright © 2004 Emery Berger
AbstractDocumentation for the Hoard scalable memory allocator, including build and usage directions for several platforms.
Table of Contents
hoard: To amass and put away (anything valuable) for preservation, security, or future use; to treasure up: esp. money or wealth.
Oxford English Dictionary
The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocator for shared-memory multiprocessors. It runs on a variety of platforms, including Linux, Solaris, and Windows.
Multithreaded programs often do not scale because the heap is a bottleneck. When multiple threads simultaneously allocate or deallocate memory from the allocator, the allocator will serialize them. Programs making intensive use of the allocator actually slow down as the number of processors increases. Your program may be allocation-intensive without you realizing it, for instance, if your program makes many calls to the C++ Standard Template Library (STL).
The allocator can cause other problems for multithreaded code. It can lead to false sharing in your application: threads on different CPUs can end up with memory in the same cache line, or chunk of memory. Accessing these falsely-shared cache lines is hundreds of times slower than accessing unshared cache lines.
Multithreaded programs can also lead the allocator to blowup memory consumption. This effect can multiply the amount of memory needed to run your application by the number of CPUs on your machine: four CPUs could mean that you need four times as much memory. Hoard is a fast allocator that solves all of these problems.
Hoard is a drop-in replacement for malloc(), etc. In general, you just link it in or set just one environment variable. You do not have to change your source code in any way. See the section "Windows Builds" below for more information for particular platforms.
Companies using Hoard in their products and servers include AOL, British Telecom, Business Objects (formerly Crystal Decisions), Entrust, Novell, OpenWave Systems (for their Typhoon and Twister servers), and Reuters.
Open source projects using Hoard include the Bayonne GNU telephony server, the Cilk parallel programming language, the Dartmouth Scalable Simulation Framework, and the GNU Common C++ system.
You can use the available pre-built binaries or build Hoard yourself. Hoard is written to work on Windows and any variant of UNIX that supports threads, and should compile out of the box. Rather than using Makefiles or configure scripts, Hoard includes custom scripts that all start with the prefix compile.
You can compile Hoard out of the box for Linux and Solaris using the
GNU compilers (g++) just by running the
There are now three alternative ways of using Hoard with Windows.
The first approach builds a DLL,
its associated library
The second approach relies on Microsoft Research's Detours. With
Detours, you can take advantage of Hoard without having to relink your
applications. Install Detours into
C:\detours, and then build the Hoard
The third approach generates winhoard, which replaces malloc/new calls in your program any DLLs it might use.
In UNIX, you can use the
LD_PRELOAD variable to use
Hoard instead of the system allocator for any program not linked with
the "static option" (that's most programs). Below are settings for
Linux and Solaris.
Depending on whether you are using the GNU-compiled version (as
compile) or the Sun
Workshop-compiled versions (produced by
compile-sunw), your settings will be slightly
For some security-sensitive applications, Solaris requires you place
libraries used in
LD_PRELOAD into the
/usr/lib/secure directory. In that event,
after copying these libraries into
LD_PRELOAD by omitting the absolute locations of the libraries, as follows:
There are three ways to use Hoard on Windows.
By using Detours, you can take advantage of Hoard's benefits without
relinking your Windows application (as long as it is dynamically
linked to the C runtime libraries). You will need to use one of the
two included Detours tools (
withdll.exe in the
detours/ directory) in conjunction with
this version of Hoard. To temporarily use Hoard
as the allocator for a given application, use
withdll -d:hoarddetours.dll myprogram.exeIf you want your program to use Hoard without having to invoke
withdllevery time, you can use
setdllto add it to your executable:
setdll -d:hoarddetours.dll myprogram.exe myprogram.exeYou can later remove Hoard from your executable as follows:
setdll -r:hoarddetours.dll myprogram.exe
Another method is to use
like Detours, replaces malloc/new calls from your program and any DLLs
it might use (leaving
intact). One advantage is that it does not require Detours to do this.
To use the Winhoard version, link your executable with
winhoard.lib, and then use
cl /Ox /MD myprogram.cpp usewinhoard.obj winhoard.lib
The last method is to link directly with the
libhoard DLL. This approach is simple, but only
suitable for small applications, since it will not affect malloc calls
in any other DLL you might load. To use this option, you should put
the following into your source code as the very first lines:
#if defined(USE_HOARD) #pragma comment(lib, "libhoard.lib") #endifThis stanza should be in the first part of a header file included by all of your code. It ensures that Hoard loads before any other library (you will need
libhoard.libin your path). When you execute your program, as long as
libhoard.dllis in your path, your program will run with Hoard instead of the system allocator. Note that you must compile your program with the
/MDflag, as in:
cl /MD /G6 /Ox /DUSE_HOARD=1 myprogram.cppHoard will not work if you use another switch (like
/MT) to compile your program.
What kind of applications will Hoard speed up?
Hoard will always improve the performance of multithreaded programs running on multiprocessors that make frequent use of the heap (calls to malloc/free or new/delete, as well as many STL functions). Because Hoard avoids false sharing, Hoard also speeds up programs that only occasionally call heap functions but access these objects frequently.
I'm using the STL but not seeing any performance improvement. Why not?
In order to benefit from Hoard, you have to tell STL to use malloc instead of its internal custom memory allocator, as in:
typedef list<unsigned int, malloc_alloc> mylist;
Have you compared Hoard against mtmalloc or libumem?
Yes. Hoard is much faster than either. For example, here's an execution of threadtest on Solaris:
What systems does Hoard work on?
Hoard has been successfully tested on numerous Windows, Linux and Solaris systems, including a 4-processor x86 box running Windows NT/2000, a 4-processor x86 box running RedHat Linux 6.0 and 6.1, and a 16-processor Sun Enterprise server running Solaris.
Have you compared Hoard with SmartHeap SMP?
We tried SmartHeap SMP but it did not work on our Suns (due to an apparent race condition in the code).
The first place to look for Hoard-related information is at the Hoard web page, www.hoard.org.
There are two mailing lists you should consider being on if you are a user of Hoard. If you are just interested in being informed of new releases, join the Hoard-Announce list. For general Hoard discussion, join the Hoard mailing list. You can also search the archives of these lists.
The use and distribution of Hoard is governed by the GNU General
Public License as published by the Free Software Foundation: see the
COPYING for more details.
Because of the restrictions imposed by this license, most commercial users of Hoard have purchased commercial licenses through the University of Texas at Austin. You may contact Emery Berger for more information.