TCMalloc is Google's customized implementation of C's malloc()
and C++'s
operator new
used for memory allocation within our C and C++ code. This custom
memory allocation framework is an alternative to the one provided by the C
standard library (on Linux usually through glibc
) and C++ standard library.
TCMalloc is designed to be more efficient at scale than other implementations.
Specifically, TCMalloc provides the following benefits:
- Performance scales with highly parallel applications.
- Optimizations brought about with recent C++14 and C++17 standard enhancements, and by diverging slightly from the standard where performance benefits warrant. (These are noted within the TCMalloc Reference.)
- Extensions to allow performance improvements under certain architectures, and additional behavior such as metric gathering.
TCMalloc may operate in one of two fashions:
- (default) per-CPU caching, where TCMalloc maintains memory caches local to individual logical cores. Per-CPU caching is enabled when running TCMalloc on any Linux kernel that utilizes restartable sequences (RSEQ). Support for RSEQ was merged in Linux 4.18.
- per-thread caching, where TCMalloc maintains memory caches local to each application thread. If RSEQ is unavailable, TCMalloc reverts to using this legacy behavior.
NOTE: the "TC" in TCMalloc refers to Thread Caching, which was originally a distinguishing feature of TCMalloc; the name remains as a legacy.
In both cases, these cache implementations allows TCMalloc to avoid requiring locks for most memory allocations and deallocations.
TCMalloc provides APIs for dynamic memory allocation: malloc()
using the C
API, and ::operator new
using the C++ API. TCMalloc, like most allocation
frameworks, manages this memory better than raw memory requests (such as through
mmap()
) by providing several optimizations:
- Performs allocations from the operating system by managing specifically-sized chunks of memory (called "pages"). Having all of these chunks of memory the same size allows TCMalloc to simplify bookkeeping.
- Devoting separate pages (or runs of pages called "Spans" in TCMalloc) to specific object sizes. For example, all 16-byte objects are placed within a "Span" specifically allocated for objects of that size. Operations to get or release memory in such cases are much simpler.
- Holding memory in caches to speed up access of commonly-used objects. Holding such caches even after deallocation also helps avoid costly system calls if such memory is later re-allocated.
The cache size can also affect performance. The larger the cache, the less any given cache will overflow or get exhausted, and therefore require a lock to get more memory. TCMalloc extensions allow you to modify this cache size, though the default behavior should be preferred in most cases. For more information, consult the TCMalloc Tuning Guide.
Additionally, TCMalloc exposes telemetry about the state of the application's
heap via MallocExtension
. This can be used for gathering profiles of the live
heap, as well as a snapshot taken near the heap's highwater mark size (a peak
heap profile).
TCMalloc implements the C and C++ dynamic memory API endpoints from the C11, C++11, C++14, and C++17 standards.
From C++, this includes
- The basic
::operator new
,::operator delete
, and array variant functions. - C++14's sized
::operator delete
- C++17's overaligned
::operator new
and::operator delete
functions.
Unlike in the standard implementations, TCMalloc does not throw an exception
when allocations fail, but instead crashes directly. Such behavior can be used
as a performance optimization for move constructors not currently marked
noexcept
; such move operations can be allowed to fail directly due to
allocation failures. In Abseil, these
are enabled with -DABSL_ALLOCATOR_NOTHROW
.
From C, this includes malloc
, calloc
, realloc
, and free
.
The TCMalloc API obeys the behavior of C90 DR075 and DR445 which states:
The alignment requirement still applies even if the size is too small for any object requiring the given alignment.
In other words, malloc(1)
returns alignof(std::max_align_t)
-aligned pointer.
Based on the progress of
N2293, we may relax
this alignment in the future.
For more complete information, consult the TCMalloc Reference.