Shrinking the OxCaml js_of_ocaml bundle: 285 MB to 4 MB
10 May 2026In the
previous post on
capsules, I cheated. The lecture I was adapting (from my
CS6868 course on
language abstractions for parallelism) used
Await_capsule.Mutex.with_lock, the recommended non-deprecated way
to acquire a capsule mutex, but the post shipped
Capsule_blocking_sync.Mutex instead with the deprecation alert
silenced. The reason was bundle size: the await library, once we
chased its transitive dependencies through base, sexplib0,
base_quickcheck and the rest of Jane Street’s runtime, would have
ballooned the in-browser toplevel by roughly 285 MB. The right
API would not even fit through GitHub’s 100 MB per-file push limit,
let alone be reasonable to send to a reader’s browser.
This post is the story of how we got from 285 MB down to 4 MB and
made the resulting bundle compose cleanly with the in-browser
toplevel, so the lecture’s Await_capsule form works end-to-end in
the cell at the bottom of this post. Most of the work happened on a
branch of ocsigen/js_of_ocaml,
with a smaller piece in
art-w/x-ocaml,
the WebComponent that powers the cells.
Why bundle size matters
I teach two OCaml-heavy courses at IIT Madras:
CS3100, the
undergraduate functional programming course, and
CS6868, the more
recent graduate course on language abstractions for parallelism.
The lecture notes, examples and homework for both would be much
more useful as interactive books that a student can read, edit
and run entirely client-side, with no local installation. The same
shape would help us when we run hands-on OCaml and OxCaml
workshops, where the first session routinely gets eaten by the
installation hump: getting opam, the compiler and the required
libraries working on every attendee’s machine over patchy
conference WiFi, before the teaching can begin.
The broader effort to make installation painless is the OCaml
Platform roadmap, which
we have been working on at Tarides as a “zero to OCaml in one
click” experience. That roadmap targets a developer who wants a
real local toolchain, with the full editor, debugger and
project-management story, and a generous latency budget since this
is a one-time setup. A workshop attendee has a much narrower
target: just enough OCaml to complete the exercises in front of
them. The client-side x-ocaml toplevel fits that target
naturally, because everything ships as static assets and there is
no installation step. The bundle, in this setting, is the latency
budget: 285 MB makes the in-browser path unshippable, 4 MB makes
it a realistic alternative to a local toolchain for a 90-minute
session.
Why 285 MB?
The recipe x-ocaml already had for “load extra libraries into a
running in-browser toplevel” goes like this. For each cma you want
to ship, run
$ js_of_ocaml --toplevel <library>.cma -o lib.js
then concatenate the per-cma outputs into a single bundle and load
it via <script src-load=...>. Each per-cma output is kind=cma:
it registers the cma’s modules into the existing toplevel without
clobbering anything, the modules light up, and you can open them
from a cell. This works, and it is what the previous two OxCaml
posts already use.
The trouble is that dead code elimination runs one cma at a time.
If you ship base, you ship all of base, because the per-cma
DCE pass never gets to look at the linked-together program and
notice that the await library you actually want only touches a
small slice of sexplib0. So the bundle for the closure
await + capsule + basement ends up being the union of every cma
in the transitive dependency, fully linked, which comes out to
roughly 285 MB after the OxCaml compiler’s normal optimisations and
before any of the JavaScript-side cleverness has had a chance to
run.
In other words, the await-based blessed API is unshippable not
because the bundling tooling is broken, but because per-cma DCE is
the wrong granularity for this problem.
The other recipe, and why it doesn’t compose
It turns out js_of_ocaml already has a second recipe that
does perform cross-cma DCE,
which I learned about when Ricky
Vetter
pointed me at the --export route on X. The recipe has two steps:
- Build a single bytecode that links every library you want, with
-linkallso nothing gets pruned at the bytecode level. - Hand that single bytecode to
js_of_ocaml --toplevel --export units.txt. The export list names the compilation units that should remain visible to the toplevel; everything else is fair game for DCE on the unified intermediate representation.
For the same await + capsule + basement set, this recipe produces
a 4 MB bundle, almost two orders of magnitude smaller than the
per-cma concatenation. The link-step DCE works at function
granularity over the whole linked program, so the unused parts of
sexplib, base, base_quickcheck and the rest of the dependency
closure get pruned away cleanly.
So why aren’t we already using this recipe?
Because the artifact that comes out the other end is kind=exe,
that is, a self-contained executable rather than a library. When
you load such a bundle into a Web Worker that already has a running
toplevel, its initialisation runs in caml_main style and starts
by overwriting the host’s globals:
caml_global_data.symbols = <bundle's symbol table>
caml_global_data.sections = <bundle's bytecode sections>
caml_global_data.prim_count = <bundle's primitive count>
caml_global_data.aliases = <bundle's alias table>
Those four assignments overwrite the host toplevel’s tables. After
the bundle loads, caml_get_global_data().symbols is the bundle’s
symbols, not the worker’s, and anything in the host toplevel that
does name-based symbol resolution (Toploop, hover-types lookup,
open Stdlib) now consults a table that does not know about the
modules the worker had already linked. The toplevel survives, but
its typing environment is wrong, and cells stop being able to
open anything. We hit this dead end in the capsules
post; the bundle was
correct and the sizes were great, but the integration step that
connects an exe-shaped bundle to an existing toplevel was the
missing piece.
The patch
The fix is a flag, --toplevel-extend, that I added on a branch of
ocsigen/js_of_ocaml.
When it is set, js_of_ocaml --toplevel --export … emits exactly
the same DCE output as before, with the same bytes for the
registered modules and the same .cmi files embedded under
/static/cmis/, but with three small changes:
- packed as
~standalone:false, so there is noglobalThispolyfill IIFE wrapping the output, - with the four
caml_js_setwrites tocaml_global_datafrom above skipped, and - tagged as
kind=cmain the buildInfo header.
The bundle’s modules still register themselves on load via the
ordinary
caml_register_global(n, v, name),
which the runtime correctly merges via symidx into the host’s
existing tables. The result is additive, not destructive: the host
toplevel’s symbol table, sections, primitives and aliases all
survive intact, and its typing environment continues to resolve
Stdlib, Toploop and everything else that was already linked.
The new modules from the bundle simply show up as new symbols on
top.
The initial diff is small:
parse_bytecode.ml
gates the caml_js_set block on the new flag, and
cmd_arg.ml/compile.ml thread it through. That gets the bundle
composable. The actual debugging is what came afterwards, when we
tried to make the composed bundle behave the way the host
expected, and that is what the rest of this post is about.
Wiring it through x-ocaml
On the x-ocaml side, the
--dce flag
drives the single-bytecode + --export build, invoking
js_of_ocaml --toplevel-extend --export units.txt for the bundles
we ship. The
oxcaml branch of my x-ocaml fork
carries the change; the patched js_of_ocaml needs to be on PATH
when the build runs.
Numbers
For the full closure of libraries the lecture’s gensym example
uses, the two paths come out very differently:
| Per-cma | --dce --toplevel-extend |
Ratio | |
|---|---|---|---|
| basement + capsule0 only | 1.0 MB | 1.0 MB | 1× |
| + capsule + await + portable | 285 MB | 4.0 MB | ~70× |
The basement + capsule0 row is essentially a wash, because the
bundle size at that scale is dominated by the .cmi files and there
is very little JavaScript code for DCE to chew on. Once await and
the curated capsule API enter the picture the per-cma path
balloons, because it has to ship every cma in the transitive
closure of base, sexplib0, bin_prot, base_quickcheck and the
ppx_* runtime libraries, while --dce keeps only the functions
that are actually reachable from the export list, plus the .cmi
files the toplevel needs to elaborate the signatures the user types
into a cell.
A few more snags
Getting the bundle to be kind=cma was the easy part. Composing it
with an already-running toplevel turned out to surface a small
zoo of follow-on issues, each of which had a short fix once we
understood it. They all live on the same
kc-toplevel-extend
branch; the commit messages have the gory details.
-
Predefined-exception identity drifts across bundles. The bundle’s re-allocated
Not_found,Sys_errorand friends are physically distinct from the host’s copies, so atry ... with Not_found ->in stdlib code (the first place we hit this wasHashtbl.randomized_defaultreadingOCAMLRUNPARAM) fails to catch the host’sNot_foundraised by the runtime. The fix is to bind each predef-exn variable in the bundle to a runtimecaml_get_global_datalookup so the bundle reuses the host’s instances. -
The pseudo-filesystem raises on duplicate
.cmiregistrations. The bundle wants to re-emit/static/cmis/stdlib.cmi, which the host has already registered at boot, andMlFakeDevice.registerrefuses to overwrite. Makingregisteridempotent removes the conflict without losing anything: the two copies ofstdlib.cmiagree, since they come from the sameopamswitch. -
Stdlib re-registration overwrites host modules. Without a guard,
caml_register_globalwas cheerfully replacing the host’scaml_global_data["Format"](and every other stdlib module the bundle’s bytecode happens to link in) with the bundle’s freshly initialised copy. Adding an early return when the name is already known fixes this without changing the behaviour for any name the host does not yet have. -
Domain.DLSslot collisions silently broke hover types. The bundle re-runs stdlib’s module init when it loads, andStdlib__Domain.DLS’slet key_counter = Atomic.make 0re-allocates the counter and starts it from zero. The bundle’sFormat.stdbuf_keythen ends up at a low DLS index the host had already assigned to itsFormat.stdbuf_key, andDLS.setoverwrites the host’s entry in the sharedcaml_domain_dlsarray. The host’sFormat.flush_str_formatterthen reads from the bundle’s empty buffer, merlin’s type-enclosing printer (which flushes throughFormat.str_formatter) returns""for every query, and hover-on-identifier tooltips come up blank. The fix is at the bundle-load boundary inbuild_portable_js_extend.sh: snapshotcaml_domain_dls_get ()before the bundle’s IIFE runs, and restore the host-owned slots afterwards. The bundle’s new high-index slots are left alone; only the host’s previously populated slots get restored. I spent a while convinced this was a Format DCE bug before tracing through the OCaml 5+Domain.DLSinit path. -
Bundle build: the curated
capsuleandawait.capsuleAPIs bothopen! Baseat the top of their files, so their interfaces mentionBase.unitrather thanStdlib.unit. To elaborate those signatures the host toplevel needs a small chain ofbaseandsexplib0.cmifiles at/static/cmis/. We ship threebasecmis (base.cmi,base__.cmi,base__Unit.cmi) and twosexplib0cmis viajs_of_ocaml’s--file=<src>:/static/cmis/flag, which embeds the file directly without puttingbaseon the bytecode-link line (which would drag the whole ofbaseback in).
The cell
The cell below uses the lecture’s
gensym_capsule.ml
shape directly: Await_capsule.Mutex.with_lock taking an
Await_kernel.Await.t witness, with Capsule_expert.Data.create
and Capsule_expert.Data.unwrap for the brand-locked counter. No
[@@@alert "-deprecated"], no shim. Press Run.
We cannot actually demonstrate parallelism in the browser worker
(it is single-domain), so we hand the body the trivial blocker
Await_blocking.await Terminator.never. The mode-checking is the
same brand/local/once dance
as in the capsules post; what is new is that the same dance now
type-checks against the blessed Await_capsule API directly,
without the Capsule_blocking_sync shim.
What next?
The fork is small enough that --toplevel-extend is plausibly
upstreamable into ocsigen/js_of_ocaml. The DLS-snapshot dance
would also be cleaner as a proper runtime primitive than as a
wrapper around the bundle’s IIFE. The
branch
is open for review.
For x-ocaml itself, the obvious next step is to broaden the set
of extension bundles. The current portable.js covers
basement + capsule0 + capsule + await + portable, which is just
enough for the capsule and parallelism material in
CS6868; the
CS3100 material would
want a different slice, and an async/Eio-flavoured bundle
would let the same cells host concurrency examples. Once a small
library of these bundles exists, turning a lecture set or a
workshop tutorial into an interactive book becomes mostly a matter
of picking the right bundle, which is the workshop-scale “zero to
OxCaml” story I want to get to.
x-ocaml is one of Arthur Wendling’s
hacking expeditions, and it remains a pleasure to build on. Thanks
to Ricky Vetter for the --export
pointer that got the whole thing started, and to the OxCaml team
for the libraries.
Written together with Claude Opus 4.7 (1M
context). The
js_of_ocaml
diff
against the +ox base and the
x-ocaml integration
diff
are both on GitHub.