Foundations for hacking on OCaml
10 Nov 2025How do you acquire the fundamental computer skills to hack on a complex systems project like OCaml? What’s missing and how do you go about bridging the gap?
There are many fundamental systems skills that go into working on a language like OCaml that only come with soaking in systems programming. By systems programming, I mean the ability to use tools like the command-line, editors, version control, build systems, compilers, debuggers, bash scripting, and so on. This is often something that one takes for granted when working on such projects, but is often inscrutable for new contributors, who may not have had the opportunity to develop these skills.
I struggle with this in my own research group. Students approach me to work on the OCaml compiler because they have studied OS, Compilers and Computer Architecture in class. But once they understand that working on OCaml involves actually hacking on systems, they are often lost. How do you build the compiler from source? How do you manage your changes? Do I have to build the entire compiler if I make a small change in the runtime system? The compiler crashes with a segfault – how do I debug it? Worse, the students do not even know what questions to ask, and come back with “This is all new to me, I don’t know where to begin. ChatGPT doesn’t help.”
The CS education in India often lacks a focus on these practical systems skills, which can make it challenging for new contributors to get involved in systems programming. Looking at my own past, my undergraduate CS education, like many others in India (and potentially elsewhere), had mandatory OS and Compiler Construction courses. But neither had a dedicated lab component. It is natural that these theoretical courses do not prepare the students for the practical aspects of systems programming.
I was privileged to have a computer at my school, an IBM PC AT Model 5170 and later an IBM PC 340, and surprisingly had an education where I did some programming from a very young age. There was lots of BASIC programming but also just tinkering with the system, learning how to use DOS, and later Windows 3.1, 95, and of course playing games (Doom and Prince of Persia mostly). This early exposure to computers and systems programming gave me a head start. Many students, especially those from less privileged backgrounds, do not have this early exposure. They may have learned some programming, but not had the time to tinker with systems for extended periods of time.
This challenge of bridging the gap between theoretical CS education and practical systems programming skills is a common one faced by professors working in the broad systems area. The problem is compounded by the fact that these skills are difficult to teach in a traditional classroom setting—they require hands-on experience, experimentation, and often many hours of frustration and debugging. These are skills that come from doing, not from reading or watching lectures. I would be curious to hear from others about their experiences and how they have addressed this challenge.
That said, there are resources available online that can help new contributors acquire these skills. This list is biased to the areas of the compiler that I work on. I mainly work on the backend and the runtime system. The only reason I usually touch the frontend is to lower the features that I care about to the backend. Here are some I have found useful for working on the OCaml compiler:
- Systems programming
- Course: MIT Missing Semester: This is a fantastic resource that covers a wide range of topics related to systems programming, including command-line tools, version control, editors, and more. The course is available online for free and includes video lectures, notes, and exercises. I encourage you to read the motivation for this course.
- Course: Stanford CS45: CS45 is an extended version of the MIT course, and delves into the topics in more detail.
- Video: CppCon 2015: Greg Law “Give me 15 minutes & I’ll change your view of GDB”: The talk explores GDB’s less-known features and sheds light on some advanced debugging techniques.
- Tool: rr - Lightweight Recording and Deterministic Debugging:
rr is a powerful tool for recording and replaying program execution, which
can be invaluable for debugging complex issues in systems programming. I’ve
stopped using
gdbdirectly for anything non-trivial and have switched torr.
- OCaml
- Course: CS3100 Paradigms of Programming: The course covers a significant chunk of the OCaml language. You should be able to self-study the course to get a good understanding of the language. That said, the course deliberately does not cover the build system (dune), package manager (opam), command-line tools for the compiler (ocamlc, ocamlopt), editor integration (merlin, ocaml-lsp, ocamlformat), etc.
- Book: Real World OCaml: The book has a section on the compiler and the runtime system, which gives a great overview of the memory representation, garbage collection, and other aspects of the runtime system.
- Diving deeper
- Book: Systems Performance: Enterprise and the Cloud, 2nd Edition: This book provides an in-depth look at systems performance, covering topics such as CPU architecture, memory hierarchy, storage systems, and networking. It is a valuable resource for understanding the underlying principles of systems programming and performance optimization.
- Book: The Garbage Collection Handbook: This book offers a comprehensive overview of garbage collection techniques, algorithms, and implementations. It is an essential resource for understanding memory management in programming languages like OCaml.
- Book: The Art of Multiprocessor Programming: This book provides a deep dive into concurrent programming and synchronization techniques, which are crucial for understanding multi-threaded runtime systems like OCaml 5’s multicore runtime and the programming model.
I will probably keep editing this post as I find more resources. If you have suggestions for other useful resources or experiences to share, please feel free to reach out to me.