Profiling the stack
27 Oct 2015In the last post, I described a flat allocation profiler for OCaml 4.02 bytecode interpreter. In this post, I’ll describe further developments which add support for call stack information and better location information. Lets dive straight to the usage:
Enabling stack profiling
Stack profiling is enabled by setting the environment variable
CAML_PROFILE_STACK
to the intended depth of stack. Suppose we would like to
attribute any allocation to the current function, we would set
CAML_PROFILE_STACK=1
. To do the same to the current function and its caller,
we would set CAML_PROFILE_STACK=2
. CAML_PROFILE_STACK=<INFINITY>
should
give you stack profile all the way down to the first function.
Why should I care about the stack depth?
Because it affects the program performance. Enabling stack profiling walks the stack for every allocation. This has the potential to severely affect the program performance. Most often, with a flat profile, you’ve tracked the offending allocation to some function in the standard library such as1:
And all you want is to find out the caller of that standard library function in your code. A stack depth of a small number should provide you this information. You might have to play around with the stack depth to identify what you are looking for.
Profiling N-queens
You can obtain and install the profiling enabled OCaml 4.02 here. Let us obtain the flat profile first.
Observe that we now have the precise location information directly in the
profile, whereas
earlier
one had to manually identify the source location using the instruction
information. In this profile, we see that most allocations were in
list.ml:55
, which is the List.map
function. However, we cannot pin down the
source of these allocations in queens.ml
from this profile since the profile
is flat. Let us now obtain the stack allocation profile, which will reveal the
source of these allocations in queens.ml
.
I’ve chosen a stack depth of 10000 to obtain the complete stack profile of the
program. The option --sort-stack
to allocprof
sorts the results based on
the stack allocation profile. We can now clearly see the stack of functions
that perform most allocations. The line
says that 98.78% of all allocations were performed by the function at
queens.ml:100
, characters 33-42, and its callees. This isn’t surprising since
this function is the top-level main
function!
More interesting is the 98.21% of allocations on queens.ml:45
. This is the
recursive call to the concmap
function,
which in turn invokes the List.map
function on queens.ml:61
. We’ve now
pinned down the source of the allocation in list.ml:55
to queens.ml:61
.
Caveats and conclusions
Unlike stack profiles of C programs, OCaml’s stack profile does not include all the functions in the call stack since many calls are in tail positions. Calls to functions at tail position will not have a frame on the stack, and hence will not be included in the profile.
Please do submit issues and bug-fixes. Pull-requests are welcome! Also, here is my trimmed down (yay \o/!), non-exhaustive wish list of features:
- Dump the profile every few milliseconds to study the allocation behavior of programs over time.
- Save the location information in the object header and dump the heap at every GC to catch space leaks.
-
Thanks to trevorsummerssmith for the example. ↩