Profiling the stack27 Oct 2015
In the last post, I described a flat allocation profiler for OCaml 4.02 bytecode interpreter. In this post, I’ll describe further developments which add support for call stack information and better location information. Lets dive straight to the usage:
Enabling stack profiling
Stack profiling is enabled by setting the environment variable
CAML_PROFILE_STACK to the intended depth of stack. Suppose we would like to
attribute any allocation to the current function, we would set
CAML_PROFILE_STACK=1. To do the same to the current function and its caller,
we would set
give you stack profile all the way down to the first function.
Why should I care about the stack depth?
Because it affects the program performance. Enabling stack profiling walks the stack for every allocation. This has the potential to severely affect the program performance. Most often, with a flat profile, you’ve tracked the offending allocation to some function in the standard library such as1:
And all you want is to find out the caller of that standard library function in your code. A stack depth of a small number should provide you this information. You might have to play around with the stack depth to identify what you are looking for.
You can obtain and install the profiling enabled OCaml 4.02 here. Let us obtain the flat profile first.
Observe that we now have the precise location information directly in the
one had to manually identify the source location using the instruction
information. In this profile, we see that most allocations were in
list.ml:55, which is the
List.map function. However, we cannot pin down the
source of these allocations in
queens.ml from this profile since the profile
is flat. Let us now obtain the stack allocation profile, which will reveal the
source of these allocations in
I’ve chosen a stack depth of 10000 to obtain the complete stack profile of the
program. The option
allocprof sorts the results based on
the stack allocation profile. We can now clearly see the stack of functions
that perform most allocations. The line
says that 98.78% of all allocations were performed by the function at
queens.ml:100, characters 33-42, and its callees. This isn’t surprising since
this function is the top-level
More interesting is the 98.21% of allocations on
queens.ml:45. This is the
recursive call to the
which in turn invokes the
List.map function on
queens.ml:61. We’ve now
pinned down the source of the allocation in
Caveats and conclusions
Unlike stack profiles of C programs, OCaml’s stack profile does not include all the functions in the call stack since many calls are in tail positions. Calls to functions at tail position will not have a frame on the stack, and hence will not be included in the profile.
Please do submit issues and bug-fixes. Pull-requests are welcome! Also, here is my trimmed down (yay \o/!), non-exhaustive wish list of features:
- Dump the profile every few milliseconds to study the allocation behavior of programs over time.
- Save the location information in the object header and dump the heap at every GC to catch space leaks.