KC Sivaramakrishnan CTO @ Tarides

Off-CPU-time analysis

Off-CPU analysis is where the program behavior when it is not running is recorded and analysed. See Brendan Gregg’s eBPF based off-CPU analysis. While on-CPU performance monitoring tools such as perf give you an idea of where the program is actively spending its time, they won’t tell you where the program is spending time blocked waiting for an action. Off-CPU analysis reveals information about where the program is spending time passively.

Installation

Install the tools from https://github.com/iovisor/bcc/.

Enabling frame pointers

The off-CPU stack trace collection, offcputime-bpfcc, requires the programs to be compiled with frame pointers for full backtraces.

OCaml

For OCaml, you’ll need a compiler variant with frame pointers enabled. If you are installing a released compiler using opam, look for +fp variants in opam switch list-available.

Instead, if you are building the OCaml compiler from source, configure the compiler with --enable-frame-pointers option:

$ ./configure --enable-frame-pointers

Lastly, there is an option to create an opam switch with the development branch of the compiler. The instructions are in ocaml/HACKING.adoc. In order to create an opam switch from the current working directory, do:

$ opam switch create . 'ocaml-option-fp' --working-dir

glibc

The libc is not compiled with frame pointers by default. This will lead to many truncated stack traces. On Ubuntu, I did the following to get a glibc with frame pointers enabled:

  1. Install glibc with frame pointers
    $ sudo apt install libc6-prof
    
  2. LD_PRELOAD the glibc with frame pointers
    $ LD_PRELOAD=/lib/libc6-prof/x86_64-linux-gnu/libc.so.6 ./myapp.exe
    

Running

On one terminal run the program that you want to analyze:

$ LD_PRELOAD=/lib/libc6-prof/x86_64-linux-gnu/libc.so.6 ./ocamlfoo.exe

On another terminal run offcputime-bpfcc tool:

$ sudo offcputime-bpfcc --stack-storage-size 2097152 -p $(pgrep -f ocamlfoo.exe) 10 > offcputime.out

The command instruments the watches for 10s and the writes out the stack traces corresponding to blocking calls in offcputime.out. We use a large stack storage size argument so as to not lose stack traces. Otherwise, you will see many [Missing User Stack] errors in the back traces.

Caveats

offcputime-bpfcc must run longer than the program being instrumented by a few seconds so that the function symbols are resolved. Otherwise you may see [unknown] in the backtrace for function names.

Oddities

I still see an order of magnitude difference between the maximum pauses observed using offcputime-bpfcc and olly trace. Something is off.


Creative Commons License kcsrk dot info by KC Sivaramakrishnan is licensed under a Creative Commons Attribution 4.0 International License.