Off-CPU-time analysis
24 Jul 2024Off-CPU analysis is where the program behavior when it is not running is
recorded and analysed. See Brendan Gregg’s eBPF based off-CPU
analysis. While on-CPU
performance monitoring tools such as perf
give you an idea of where the
program is actively spending its time, they won’t tell you where the program
is spending time blocked waiting for an action. Off-CPU analysis reveals
information about where the program is spending time passively.
Installation
Install the tools from https://github.com/iovisor/bcc/.
Enabling frame pointers
The off-CPU stack trace collection, offcputime-bpfcc
, requires the programs to
be compiled with frame pointers for full backtraces.
OCaml
For OCaml, you’ll need a compiler variant with frame pointers enabled. If you
are installing a released compiler using opam
, you can create one the following
switch command opam switch create 5.2.0+fp 5.2.0 ocaml-option-fp
. Change out
5.2.0
for your preferred OCaml version.
Instead, if you are building the OCaml compiler from source, configure
the
compiler with --enable-frame-pointers
option:
$ ./configure --enable-frame-pointers
Lastly, there is an option to create an opam switch with the development branch
of the compiler. The instructions are in ocaml/HACKING.adoc
. In order to
create an opam switch from the current working directory, do:
$ opam switch create . 'ocaml-option-fp' --working-dir
glibc
The libc is not compiled with frame pointers by default. This will lead to many truncated stack traces. On Ubuntu, I did the following to get a glibc with frame pointers enabled:
- Install glibc with frame pointers
$ sudo apt install libc6-prof
- LD_PRELOAD the glibc with frame pointers
$ LD_PRELOAD=/lib/libc6-prof/x86_64-linux-gnu/libc.so.6 ./myapp.exe
Running
On one terminal run the program that you want to analyze:
$ LD_PRELOAD=/lib/libc6-prof/x86_64-linux-gnu/libc.so.6 ./ocamlfoo.exe
On another terminal run offcputime-bpfcc
tool:
$ sudo offcputime-bpfcc --stack-storage-size 2097152 -p $(pgrep -f ocamlfoo.exe) 10 > offcputime.out
The command instruments the watches for 10s and the writes out the stack traces
corresponding to blocking calls in offcputime.out
. We use a large stack
storage size argument so as to not lose stack traces. Otherwise, you will see
many [Missing User Stack]
errors in the back traces.
Caveats
offcputime-bpfcc
must run longer than the program being instrumented by a few
seconds so that the function symbols are resolved. Otherwise you may see
[unknown]
in the backtrace for function names.
Oddities
I still see an order of magnitude difference between the maximum pauses observed
using offcputime-bpfcc
and olly trace
. Something is off.