Memory and Performance: Garbage Collection Pt. 2
For the sake of the article length, I won’t walk through running every command here. Instead, I’ve documented everything to make it seamless for you to reproduce these results yourself. I encourage you to run the profiling tools and discover the patterns firsthand. All code, scripts, and detailed instructions are available here Recap In the first blog post, we explored the effects of thoughtful memory layout. The array-based BST was 53% faster than the pointer-based version despite using more memory, because sequential access leverages CPU cache lines and having predictable access patterns optimize hardware prefetchers. In the previous blog post, we learned what garbage collection is and discovered that Go’s current GC suffers from the exact same problem: it spends 85% of its time chasing scattered pointers through memory, with over 35% of CPU cycles stalled on memory accesses - confirming our findings from the first post about spatial locality. Now, we’re going to explore how to identify when GC is bottlenecking your algorithms using real profiling tools like pprof and Instruments, and demonstrate how Green Tea’s span-based scanning approach could solve these memory access problems. Reading GC Traces We have a graph implementation that creates 2M random nodes and performs BFS by visiting all nodes. Running make run EXEC=graph sets GODEBUG=gctrace=1, which hints to any running Go application to start garbage collection tracing. ...