KC Sivaramakrishnan Building Functional Systems

Shrinking the OxCaml js_of_ocaml bundle: 285 MB to 4 MB

In the previous post on capsules, I cheated. The lecture I was adapting (from my CS6868 course on language abstractions for parallelism) used Await_capsule.Mutex.with_lock, the recommended non-deprecated way to acquire a capsule mutex, but the post shipped Capsule_blocking_sync.Mutex instead with the deprecation alert silenced. The reason was bundle size: the await library, once we chased its transitive dependencies through base, sexplib0, base_quickcheck and the rest of Jane Street’s runtime, would have ballooned the in-browser toplevel by roughly 285 MB. The right API would not even fit through GitHub’s 100 MB per-file push limit, let alone be reasonable to send to a reader’s browser.

This post is the story of how we got from 285 MB down to 4 MB and made the resulting bundle compose cleanly with the in-browser toplevel, so the lecture’s Await_capsule form works end-to-end in the cell at the bottom of this post. Most of the work happened on a branch of ocsigen/js_of_ocaml, with a smaller piece in art-w/x-ocaml, the WebComponent that powers the cells.

Why bundle size matters

I teach two OCaml-heavy courses at IIT Madras: CS3100, the undergraduate functional programming course, and CS6868, the more recent graduate course on language abstractions for parallelism. The lecture notes, examples and homework for both would be much more useful as interactive books that a student can read, edit and run entirely client-side, with no local installation. The same shape would help us when we run hands-on OCaml and OxCaml workshops, where the first session routinely gets eaten by the installation hump: getting opam, the compiler and the required libraries working on every attendee’s machine over patchy conference WiFi, before the teaching can begin.

The broader effort to make installation painless is the OCaml Platform roadmap, which we have been working on at Tarides as a “zero to OCaml in one click” experience. That roadmap targets a developer who wants a real local toolchain, with the full editor, debugger and project-management story, and a generous latency budget since this is a one-time setup. A workshop attendee has a much narrower target: just enough OCaml to complete the exercises in front of them. The client-side x-ocaml toplevel fits that target naturally, because everything ships as static assets and there is no installation step. The bundle, in this setting, is the latency budget: 285 MB makes the in-browser path unshippable, 4 MB makes it a realistic alternative to a local toolchain for a 90-minute session.

Why 285 MB?

The recipe x-ocaml already had for “load extra libraries into a running in-browser toplevel” goes like this. For each cma you want to ship, run

$ js_of_ocaml --toplevel <library>.cma -o lib.js

then concatenate the per-cma outputs into a single bundle and load it via <script src-load=...>. Each per-cma output is kind=cma: it registers the cma’s modules into the existing toplevel without clobbering anything, the modules light up, and you can open them from a cell. This works, and it is what the previous two OxCaml posts already use.

The trouble is that dead code elimination runs one cma at a time. If you ship base, you ship all of base, because the per-cma DCE pass never gets to look at the linked-together program and notice that the await library you actually want only touches a small slice of sexplib0. So the bundle for the closure await + capsule + basement ends up being the union of every cma in the transitive dependency, fully linked, which comes out to roughly 285 MB after the OxCaml compiler’s normal optimisations and before any of the JavaScript-side cleverness has had a chance to run.

In other words, the await-based blessed API is unshippable not because the bundling tooling is broken, but because per-cma DCE is the wrong granularity for this problem.

The other recipe, and why it doesn’t compose

It turns out js_of_ocaml already has a second recipe that does perform cross-cma DCE, which I learned about when Ricky Vetter pointed me at the --export route on X. The recipe has two steps:

  1. Build a single bytecode that links every library you want, with -linkall so nothing gets pruned at the bytecode level.
  2. Hand that single bytecode to js_of_ocaml --toplevel --export units.txt. The export list names the compilation units that should remain visible to the toplevel; everything else is fair game for DCE on the unified intermediate representation.

For the same await + capsule + basement set, this recipe produces a 4 MB bundle, almost two orders of magnitude smaller than the per-cma concatenation. The link-step DCE works at function granularity over the whole linked program, so the unused parts of sexplib, base, base_quickcheck and the rest of the dependency closure get pruned away cleanly.

So why aren’t we already using this recipe?

Because the artifact that comes out the other end is kind=exe, that is, a self-contained executable rather than a library. When you load such a bundle into a Web Worker that already has a running toplevel, its initialisation runs in caml_main style and starts by overwriting the host’s globals:

caml_global_data.symbols    = <bundle's symbol table>
caml_global_data.sections   = <bundle's bytecode sections>
caml_global_data.prim_count = <bundle's primitive count>
caml_global_data.aliases    = <bundle's alias table>

Those four assignments overwrite the host toplevel’s tables. After the bundle loads, caml_get_global_data().symbols is the bundle’s symbols, not the worker’s, and anything in the host toplevel that does name-based symbol resolution (Toploop, hover-types lookup, open Stdlib) now consults a table that does not know about the modules the worker had already linked. The toplevel survives, but its typing environment is wrong, and cells stop being able to open anything. We hit this dead end in the capsules post; the bundle was correct and the sizes were great, but the integration step that connects an exe-shaped bundle to an existing toplevel was the missing piece.

The patch

The fix is a flag, --toplevel-extend, that I added on a branch of ocsigen/js_of_ocaml. When it is set, js_of_ocaml --toplevel --export … emits exactly the same DCE output as before, with the same bytes for the registered modules and the same .cmi files embedded under /static/cmis/, but with three small changes:

  • packed as ~standalone:false, so there is no globalThis polyfill IIFE wrapping the output,
  • with the four caml_js_set writes to caml_global_data from above skipped, and
  • tagged as kind=cma in the buildInfo header.

The bundle’s modules still register themselves on load via the ordinary caml_register_global(n, v, name), which the runtime correctly merges via symidx into the host’s existing tables. The result is additive, not destructive: the host toplevel’s symbol table, sections, primitives and aliases all survive intact, and its typing environment continues to resolve Stdlib, Toploop and everything else that was already linked. The new modules from the bundle simply show up as new symbols on top.

The initial diff is small: parse_bytecode.ml gates the caml_js_set block on the new flag, and cmd_arg.ml/compile.ml thread it through. That gets the bundle composable. The actual debugging is what came afterwards, when we tried to make the composed bundle behave the way the host expected, and that is what the rest of this post is about.

Wiring it through x-ocaml

On the x-ocaml side, the --dce flag drives the single-bytecode + --export build, invoking js_of_ocaml --toplevel-extend --export units.txt for the bundles we ship. The oxcaml branch of my x-ocaml fork carries the change; the patched js_of_ocaml needs to be on PATH when the build runs.

Numbers

For the full closure of libraries the lecture’s gensym example uses, the two paths come out very differently:

  Per-cma --dce --toplevel-extend Ratio
basement + capsule0 only 1.0 MB 1.0 MB
+ capsule + await + portable 285 MB 4.0 MB ~70×

The basement + capsule0 row is essentially a wash, because the bundle size at that scale is dominated by the .cmi files and there is very little JavaScript code for DCE to chew on. Once await and the curated capsule API enter the picture the per-cma path balloons, because it has to ship every cma in the transitive closure of base, sexplib0, bin_prot, base_quickcheck and the ppx_* runtime libraries, while --dce keeps only the functions that are actually reachable from the export list, plus the .cmi files the toplevel needs to elaborate the signatures the user types into a cell.

A few more snags

Getting the bundle to be kind=cma was the easy part. Composing it with an already-running toplevel turned out to surface a small zoo of follow-on issues, each of which had a short fix once we understood it. They all live on the same kc-toplevel-extend branch; the commit messages have the gory details.

  • Predefined-exception identity drifts across bundles. The bundle’s re-allocated Not_found, Sys_error and friends are physically distinct from the host’s copies, so a try ... with Not_found -> in stdlib code (the first place we hit this was Hashtbl.randomized_default reading OCAMLRUNPARAM) fails to catch the host’s Not_found raised by the runtime. The fix is to bind each predef-exn variable in the bundle to a runtime caml_get_global_data lookup so the bundle reuses the host’s instances.

  • The pseudo-filesystem raises on duplicate .cmi registrations. The bundle wants to re-emit /static/cmis/stdlib.cmi, which the host has already registered at boot, and MlFakeDevice.register refuses to overwrite. Making register idempotent removes the conflict without losing anything: the two copies of stdlib.cmi agree, since they come from the same opam switch.

  • Stdlib re-registration overwrites host modules. Without a guard, caml_register_global was cheerfully replacing the host’s caml_global_data["Format"] (and every other stdlib module the bundle’s bytecode happens to link in) with the bundle’s freshly initialised copy. Adding an early return when the name is already known fixes this without changing the behaviour for any name the host does not yet have.

  • Domain.DLS slot collisions silently broke hover types. The bundle re-runs stdlib’s module init when it loads, and Stdlib__Domain.DLS’s let key_counter = Atomic.make 0 re-allocates the counter and starts it from zero. The bundle’s Format.stdbuf_key then ends up at a low DLS index the host had already assigned to its Format.stdbuf_key, and DLS.set overwrites the host’s entry in the shared caml_domain_dls array. The host’s Format.flush_str_formatter then reads from the bundle’s empty buffer, merlin’s type-enclosing printer (which flushes through Format.str_formatter) returns "" for every query, and hover-on-identifier tooltips come up blank. The fix is at the bundle-load boundary in build_portable_js_extend.sh: snapshot caml_domain_dls_get () before the bundle’s IIFE runs, and restore the host-owned slots afterwards. The bundle’s new high-index slots are left alone; only the host’s previously populated slots get restored. I spent a while convinced this was a Format DCE bug before tracing through the OCaml 5+ Domain.DLS init path.

  • Bundle build: the curated capsule and await.capsule APIs both open! Base at the top of their files, so their interfaces mention Base.unit rather than Stdlib.unit. To elaborate those signatures the host toplevel needs a small chain of base and sexplib0 .cmi files at /static/cmis/. We ship three base cmis (base.cmi, base__.cmi, base__Unit.cmi) and two sexplib0 cmis via js_of_ocaml’s --file=<src>:/static/cmis/ flag, which embeds the file directly without putting base on the bytecode-link line (which would drag the whole of base back in).

The cell

The cell below uses the lecture’s gensym_capsule.ml shape directly: Await_capsule.Mutex.with_lock taking an Await_kernel.Await.t witness, with Capsule_expert.Data.create and Capsule_expert.Data.unwrap for the brand-locked counter. No [@@@alert "-deprecated"], no shim. Press Run.

let make_gensym () = let (P mutex) = Await_capsule.Mutex.create () in let counter = Capsule_expert.Data.create (fun () -> ref 0) in let fetch_and_incr (w : Await_kernel.Await.t) = Await_capsule.Mutex.with_lock w mutex ~f:(fun access -> let c = Capsule_expert.Data.unwrap ~access counter in incr c; !c) in fun w prefix -> prefix ^ "_" ^ string_of_int (fetch_and_incr w) let gensym = make_gensym () let w = Await_blocking.await Await_kernel.Terminator.never let () = Printf.printf "%s %s\n" (gensym w "x") (gensym w "y")

We cannot actually demonstrate parallelism in the browser worker (it is single-domain), so we hand the body the trivial blocker Await_blocking.await Terminator.never. The mode-checking is the same brand/local/once dance as in the capsules post; what is new is that the same dance now type-checks against the blessed Await_capsule API directly, without the Capsule_blocking_sync shim.

What next?

The fork is small enough that --toplevel-extend is plausibly upstreamable into ocsigen/js_of_ocaml. The DLS-snapshot dance would also be cleaner as a proper runtime primitive than as a wrapper around the bundle’s IIFE. The branch is open for review.

For x-ocaml itself, the obvious next step is to broaden the set of extension bundles. The current portable.js covers basement + capsule0 + capsule + await + portable, which is just enough for the capsule and parallelism material in CS6868; the CS3100 material would want a different slice, and an async/Eio-flavoured bundle would let the same cells host concurrency examples. Once a small library of these bundles exists, turning a lecture set or a workshop tutorial into an interactive book becomes mostly a matter of picking the right bundle, which is the workshop-scale “zero to OxCaml” story I want to get to.

x-ocaml is one of Arthur Wendling’s hacking expeditions, and it remains a pleasure to build on. Thanks to Ricky Vetter for the --export pointer that got the whole thing started, and to the OxCaml team for the libraries.


Written together with Claude Opus 4.7 (1M context). The js_of_ocaml diff against the +ox base and the x-ocaml integration diff are both on GitHub.


Creative Commons License kcsrk dot info by KC Sivaramakrishnan is licensed under a Creative Commons Attribution 4.0 International License.