Optimizing Go Performance with Stack Allocation

Introduction

Go developers are constantly seeking ways to make their programs faster. Over the past two releases, the Go team has focused on reducing a major bottleneck: heap allocations. Whenever a Go program allocates memory from the heap, a significant amount of runtime code executes to fulfill that request. Additionally, each heap allocation adds pressure on the garbage collector, which, despite recent improvements like the Green Tea garbage collector, still incurs substantial overhead.

Optimizing Go Performance with Stack Allocation — Source: blog.golang.org

To combat this, the team has been working on shifting more allocations from the heap to the stack. Stack allocations are much cheaper—sometimes practically free—and they place no burden on the garbage collector. When a function returns, its entire stack frame (and any data allocated within it) is automatically reclaimed. This also promotes cache-friendly reuse of memory.

The True Cost of Heap Allocations

Heap allocations are expensive because they require the runtime to find a suitable block of memory, manage fragmentation, and later track the allocation for garbage collection. Even with a highly optimized collector, the overhead can be significant in hot code paths. The garbage collector must scan the heap, identify live objects, and free unreferenced ones—all of which consumes CPU cycles and can introduce latency.

Stack Allocations: A Cheaper Alternative

Stack allocations, by contrast, are incredibly efficient. They typically involve simply moving the stack pointer upward, without any complex lookup or locking. When the function exits, the pointer moves back, instantly freeing the memory. No garbage collector involvement is needed at all. This makes stack allocation ideal for temporary data that has a clear lifetime tied to function calls.

Practical Example: Building a Slice from a Channel

Consider the common scenario of accumulating tasks from a channel into a slice. Here’s a typical Go function:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Step-by-Step Allocation Pattern

Let’s trace what happens at runtime when the channel delivers tasks. On the first iteration, the slice has no backing array, so append allocates one—typically of size 1. On the second iteration, that array is full, so a new array of size 2 is allocated, and the old size-1 array becomes garbage. On the third iteration, the size-2 array is full again, triggering an allocation of size 4. On the fourth iteration, the array of size 4 has room (only 3 items used), so no allocation occurs.

This pattern continues: the slice capacity doubles each time it fills up. While this exponential growth eventually reduces the frequency of allocations for large slices, there is a costly startup phase when the slice is small. During this phase, multiple allocations happen in rapid succession, generating short-lived garbage. If your slice never grows very large—common in many real-world workloads—you may spend a disproportionate amount of time in the allocator.

Optimizing with Constant-Sized Slices

If you know the maximum number of tasks in advance, you can preallocate the slice with a fixed capacity using make([]task, 0, maxTasks). This moves the allocation to the stack (if the slice fits within the stack frame) or at least reduces the number of heap allocations. The Go compiler is smart about allocating small fixed-size slices entirely on the stack, bypassing the heap allocator entirely.

For example, if maxTasks is known at compile time and is modest, the backing array can be placed on the stack. This eliminates the startup overhead and avoids any garbage collector load. Even if the maximum is larger, preallocating reduces allocation calls from O(log n) to just one.

Conclusion

Stack allocation is a powerful technique for improving Go program performance. By understanding where heap allocations occur—especially in hot loops with growing slices—you can optimize your code to use the stack whenever possible. Preallocating slices with known bounds is a simple change that can yield substantial speedups and reduce GC pressure. As the Go runtime continues to evolve, more patterns may automatically benefit from stack allocation, but being aware of these mechanisms helps you write efficient Go code today.