Profiling the stack
27 Oct 2015In the last post, I described a flat allocation profiler for OCaml 4.02 bytecode interpreter. In this post, I’ll describe further developments which add support for call stack information and better location information. Lets dive straight to the usage:
Enabling stack profiling
Stack profiling is enabled by setting the environment variable
CAML_PROFILE_STACK
to the intended depth of stack. Suppose we would like to
attribute any allocation to the current function, we would set
CAML_PROFILE_STACK=1
. To do the same to the current function and its caller,
we would set CAML_PROFILE_STACK=2
. CAML_PROFILE_STACK=<INFINITY>
should
give you stack profile all the way down to the first function.
Why should I care about the stack depth?
Because it affects the program performance. Enabling stack profiling walks the stack for every allocation. This has the potential to severely affect the program performance. Most often, with a flat profile, you’ve tracked the offending allocation to some function in the standard library such as1:
File "bytes.ml", line 59, characters 7-81:
C_CALL1 caml_create_string
File "src/bigstring.ml", line 98, characters 20-37:
C_CALL1 caml_create_string
And all you want is to find out the caller of that standard library function in your code. A stack depth of a small number should provide you this information. You might have to play around with the stack depth to identify what you are looking for.
Profiling N-queens
You can obtain and install the profiling enabled OCaml 4.02 here. Let us obtain the flat profile first.
$ wget http://caml.inria.fr/pub/old_caml_site/Examples/oc/basics/queens.ml
$ ocamlc -o queens -g queens.ml
$ CAML_PROFILE_ALLOC=queens.preprof ./queens
Chess boards's size ? 8
The 8 queens problem has 92 solutions.
Do you want to see the solutions <n/y> ? n
$ ./tools/allocprof queens.preprof queens.prof
$ head -n10 queens.prof
Total: 77,863 words
Instr Words % of total Location
----- ----- ---------- --------
2488 31440 40.38% file "list.ml", line 55, characters 32-39
27681 31440 40.38% file "queens.ml", line 61, characters 46-52
27775 5895 7.57% file "queens.ml", line 38, characters 2-113
27759 4112 5.28% file "queens.ml", line 40, characters 33-41
27687 3930 5.05% file "queens.ml", line 61, characters 14-59
2403 86 0.11% file "pervasives.ml", line 490, characters 8-63
5391 44 0.06% file "list.ml", line 20, characters 15-29
Observe that we now have the precise location information directly in the
profile, whereas
earlier
one had to manually identify the source location using the instruction
information. In this profile, we see that most allocations were in
list.ml:55
, which is the List.map
function. However, we cannot pin down the
source of these allocations in queens.ml
from this profile since the profile
is flat. Let us now obtain the stack allocation profile, which will reveal the
source of these allocations in queens.ml
.
$ CAML_PROFILE_ALLOC=queens.preprof CAML_PROFILE_STACK=10000 ./queens
Chess boards's size ? 8
The 8 queens problem has 92 solutions.
Do you want to see the solutions <n/y> ? n
$ ./tools/allocprof queens.preprof queens.prof --sort-stack
$ head -n10 queens.prof
Total: 77,863 words
Instr Current Cur % Stack Stack % Location
----- ------- ----- ----- ------- --------
27836 0 0.00% 76911 98.78% file "queens.ml", line 100, characters 33-42
27549 0 0.00% 76870 98.72% file "queens.ml", line 85, characters 17-36
27466 0 0.00% 76473 98.21% file "queens.ml", line 45, characters 18-31
27715 0 0.00% 65117 83.63% file "queens.ml", line 62, characters 4-22
27694 0 0.00% 62880 80.76% file "queens.ml", line 61, characters 31-59
2487 0 0.00% 55020 70.66% file "list.ml", line 55, characters 32-39
2483 0 0.00% 31440 40.38% file "list.ml", line 55, characters 20-23
I’ve chosen a stack depth of 10000 to obtain the complete stack profile of the
program. The option --sort-stack
to allocprof
sorts the results based on
the stack allocation profile. We can now clearly see the stack of functions
that perform most allocations. The line
27836 0 0.00% 76911 98.78% file "queens.ml", line 100, characters 33-42
says that 98.78% of all allocations were performed by the function at
queens.ml:100
, characters 33-42, and its callees. This isn’t surprising since
this function is the top-level main
function!
More interesting is the 98.21% of allocations on queens.ml:45
. This is the
recursive call to the concmap
function,
which in turn invokes the List.map
function on queens.ml:61
. We’ve now
pinned down the source of the allocation in list.ml:55
to queens.ml:61
.
Caveats and conclusions
Unlike stack profiles of C programs, OCaml’s stack profile does not include all the functions in the call stack since many calls are in tail positions. Calls to functions at tail position will not have a frame on the stack, and hence will not be included in the profile.
Please do submit issues and bug-fixes. Pull-requests are welcome! Also, here is my trimmed down (yay \o/!), non-exhaustive wish list of features:
- Dump the profile every few milliseconds to study the allocation behavior of programs over time.
- Save the location information in the object header and dump the heap at every GC to catch space leaks.
-
Thanks to trevorsummerssmith for the example. ↩