This may become a blog post or FAQ entry or something. For now, I'm just writing up something to answer the questions I've gotten about it!
Rewriting a Language's Compiler in Itself
Scratch-rewriting a code base in a different language is usually considered a risky move that often leads to failure. In the case of compilers, it's the reverse; most successful compilers (e.g. Java, C++, C, C#, TypeScript, Scala, Go, Rust, Zig, OCaml, Haskell, and many others) have undergone a complete scratch-rewrite...in themselves. This process is called self-hosting.
We can break down the observation that self-hosting has a long track record of success into two pieces:
- It's normal for a successful language's compiler to be rewritten from scratch in a different language.
- It's normal for the different language to be the target language itself.
Rewrites do more than just change the code base to a more appealing language. They're an opportunity to start fresh, with all the wisdom gained from the first version—to discard accumulated cruft and to avoid mistakes from the last time around (while naturally making some new mistakes). They're also notorious for causing regressions, especially in edge-case functionality. Fortunately, we've been very consistent about communicating that Roc is not stable and is going to be changing—which is the main reason we've chosen not to have a numbered release yet.
All of those tradeoffs are true about rewrites, regardless of target language. When Java's compiler got rewritten in Java, and Haskell's compiler got rewritten in Haskell, their target languages were very different—but the benefits of starting fresh and using accumulated knowledge on the rewritten version applied to both. Clearly, if you want to join in the long tradition of successful compilers that have been rewritten at some point, you can pick whatever target language suits you.
Roc's compiler has always been written in Rust, and we do not plan to self-host. We have a FAQ entry explaining why, and none of our reasoning has changed on that subject.
However, we have decided to do a scratch-rewrite in a different language—namely, Zig. We're very excited about it!
Why now?
We recently realized that our roadmap included independently rewriting almost every single part of Roc's 300K-lines-of-Rust compiler (type inference was the only exception) for unrelated reasons. More specifically:
- The parser is not as error-tolerant as we want it to be, and separately we want to rearchitect it because the grammar has evolved to the point where a different foundational parsing strategy makes sense. While we're at it, we also want to convert it to use recursive descent; it was originally written using parser combinators because that was what I was comfortable with at the time. Josh had already been experimenting with a rewrite in the existing Rust code base.
- The formatter doesn't support enforcing line width. We only want to enable the line-length enforcement for docs generation, but we do want it, and switching to a system that is capable of enforcing line length (e.g. one based on Wadler's famous A prettier printer technique) requires a full rewrite. We'd agreed that this was what we wanted to do, although nobody had started on it.
- Canonicalization (aka name resolution) needs to defer some name resolution to after type checking, it needs to report errors for out-of-order definitions instead of reordering them, it needs to stop using
Symbol
, and it needs to change from a recursive enum data structure to a flat array with IDs, so that we can cache it on disk efficiently. We also need to make shadowing a warning, and add support forvar
. In short, it needed a rewrite. Sam had already started on this. - Documentation generation doesn't resolve type aliases, or support auto-linking to docs for other types (in part because it has no awareness of other packages), or show inferred types when they weren't explicitly annotated. The changes required to address these turned out to be invasive enough that on the branch where I was fixing these issues, a rewrite was the approach I took.
- Type inference didn't need a rewrite. It needs quite a few changes (removing lambda set inference, replacing Abilities with static dispatch, replacing opaque type wrappers with nominal tag unions, removing tuple extensibility and module params), but these were all changes we would have made incrementally. Now we'll make them all at once instead.
- Monomorphization needs a rewrite to fix lambda sets. Specifically, it needs to be split into multiple different steps, on top of which the current implementation has accumulated so much cruft that we'd discussed a rewrite of that section just to make it easier to work on. We'd separately decided to take out the Morphic solver because the upstream dependency has become unmaintained, while having bugs and compiler performance problems that none of us are in a position to address anytime soon (because we don't understand Morphic well enough). Although Morphic's impact on the performance of compiled Roc programs has been positive, it hasn't been critical, and we decided we should revisit it only when we have the ability to address the problems we've encountered wtih it. Agus had already started on rewriting monomorphization.
- LLVM code generation: A recurring pain point has been that we want to update to the latest LLVM version in order to get new performance improvements (and to unblock updating to the latest Zig version, which we've use for our stdlib for years), but the LLVM upgrade process has been very time-consuming in large part because of LLVM's breaking API changes. Zig came up with a way around this: compile directly to LLVM bitcode (LLVM has strong backwards-compatibility on its bitcode but not on its public-facing API) and then upgrades become trivial because we can keep our existing code generation the same. Apparently this has added about 5% to Zig's LLVM code gen times, but in the long run they (and we) can cache portions of bitcode, which can shift it to a net performance improvement compared to the status quo. Changing to use bitcode generation instead of LLVM's APIs requires a complete rewrite of LLVM code gen, and if we were to do it in Rust, we'd have to do the bitcode generation ourselves. (A Zig rewrite can reuse the existing LLVM bitcode generation logic in the Zig compiler's code base, which is MIT-licensed.)
- Development backend: We currently do direct machine code generation (bypassing LLVM for a faster feedback loop), but now that we've seen what kind of performance that approach gets in practice, we want to try doing an interpreter instead. The theory is that the interpreter will be slower in terms of runtime performance (but not that much slower because of how naive our current machine code gen is), but it will improve the development feedback loop speed overall because an interpreter can skip monomorphization and code gen - going straight from type-checking to running the program. Trying out the interpreter-based approach requires actually writing an interpreter (which we separately want anyway for compile-time evaluation of constants) and then switching the development backend to use the interpreter—in other words, a rewrite.
Once we'd finished all these projects, the only stage of the compiler pipeline that wouldn't have been rewritten would have been type inference. And at the end of all that, we'd still have a Rust code base, when we'd rather have a Zig code base.
Why Zig over Rust?
Rust's compile times are slow. Zig's compile times are fast. This is not the only reason, but it's a big reason.
Having a slow feedback loop has been a major drain on both our productivity and also our enjoyment when working on the code base. Waiting several seconds to build a single test, before it has even started running, is not enjoyable.
A frustratingly common response I've heard to this complaint is that "several seconds is not as bad as [some other language or some other project that has minutes-long build times]!" I don't care if things could be even worse, I care that I'm feeling this pain while knowing I don't have to. Zig's (not yet stable) x86-64 backend can reportedly get recompile loops of a few seconds when working on parts of Zig's 300K line of code code base, and that's without the benefit of Zig's (also not yet stable) incremental build system. Meanwhile we're looking at 20+ seconds to rebuild after a change in our Rust parser (which is one of the fastest parts of the compiler to rebuild), and that's with Rust doing everything as incrementally as it supports.
Compile times aside, the strengths and weaknesses of Rust and Zig today are much different than they were when I wrote the first line of code in Roc's Rust compiler in 2019. Back then, Rust was relatively mature and Zig was far from where it is today. (Also, I hadn't done significant low-level programming in over a decade, and didn't have confidence I would be able to manage memory correctly on my own. That has also changed.)
Here are some relevant comparison points between the two languages in the specific context of Roc's compiler in 2025:
- For many projects, Rust's memory safety is a big benefit. As we've learned, Roc's compiler is not one of those projects. We tend to pass around allocators for memory management (like Zig does, and Rust does not) and their lifetimes are not complicated. We intern all of our strings early in the process, and all the other data structures are isolated to a particular stage of compilation.
- Besides the fact that Zig is built around passing around allocators for memory management (which we want to do anyway), it also has built-in ways to make struct-of-arrays programming nicer - such as MultiArrayList. We have abstractions to do this in Rust, but they have been painful to use. We don't have experience with Zig's yet, but looking through them, they definitely seem more appealing than what we've been using in Rust.
- Rust's large ecosystem is also a big benefit to a lot of projects. Again, Roc's compiler is not one of them. We use Inkwell, a wrapper around LLVM's APIs, but we actively would prefer to switch to Zig's direct generation of LLVM bitcode, and Zig has the only known implementation of that approach. Other than that, the few third-party Rust dependencies we use have equivalents in the Zig ecosystem. So overall, Zig's ecosystem is a selling point for us because once you filter out all the packages we have no interest in using, Zig's ecosystem has a larger absolute number of depedencies we actually want to use.
- Zig's toolchain makes it much eaiser for us to compile statically-linked Linux binaries using musl libc, which is something we've wanted to do for a long time so that Roc can run on any distro (including Alpine in containers, which it currently can't in our Rust compiler). We know this can be done in Rust, but we also know it's easier to do in Zig. Zig's compiler itself does this, and it does so while bundling LLVM (by building it from source with musl libc), which is exactly what we need to do. Once again, we can reuse Zig code that has done exactly what we want, and there is no equivalent in the Rust ecosystem.
- There are various miscellaneous performance optimization techniques that we've wanted to use in Rust, but haven't as much as we'd like to because they have been too painful. For example, we often want to pack some metadata into some of the bits of an index into an array. Zig lets us describe this index as a struct, where we specify what each of the bit ranges mean, and that includes arbitrary integer sizes. For example, in Rust we might store an index as a u32, but in Zig we could break that into a u27 for the actual index data (when we know the index will never need to exceed 2^27 anyway), and then we can specify that we want to use another 3 bits to store a small enumeration, and the remaining 2 bits for a pair of boolean flags. Again, these are all things we could do using Rust and bit shifts, but it's much nicer to do in Zig. Tagless unions are another Zig feature we expect to be valuable.
Did I mention compile times? I'll reiterate: compile times are a huge deal. Not only have they been painful in Rust, we also know that the fundamental unit of compilation is a crate, so we're incentivized to organize our code around what will make for faster compile times rather than what makes the most sense to us as authors. Often the boundaries end up being drawn in the same place regardless, but when those two are in tension, it's frustrating to have to sacrifice either feedback loop speed or code organization. In Zig we expect we can just have both.
In summary, Rust's memory safety guarantees aren't major selling points in this particular project, whereas its slow compile times have been a major pain point for us. Rust's ecosystem is larger than Zig's overall, but after filtering out all the third-party dependencies we wouldn't use anyway, Zig has more that we actually want to use. On top of all that, Zig has some language features that we're looking forward to using, and would have used in Rust if it had them.
Goals
We want the scratch-rewritten compiler to be what we release as Roc 0.1.0, its first numbered release.
There are several substantial language design changes planned for 0.1.0. Historically we've often implemented changes in a backwards-compatible way, but this time that wouldn't be worth it. We'll be implementing the 0.1.0 design directly.
Some of the things we'd like to do to improve the reliability and correctness of the compiler:
- Fuzz early. One of the things we learned when we added fuzzing to our Rust code base is that it's much easier to fix the problems fuzzing surfaces when they crop up after an incremental change. You know the problem originated with that change, even if it appears to be unrelated! The downside of maintaining this level of fuzzing early on is that it makes exploration take longer, but this is a rewrite. We've already done a ton of exploration in the Rust code base, and by now we've learned enough to have a strong sense of how we want things to work.
- Document as we go. As with fuzzing, taking the time in the original compiler to document how everything worked would have seriously slowed down exploration. (And the docs would have needed numerous rewrites as things changed.) Documentation helps not only new contributors, but also our future selves. This time, the investment makes more sense to do up-front.
- Don't panic! We used things like
unwrap()
at various points in the Rust compiler when we thought it wouldn't come up in practice, and it always ended up leaking into the user experience anyway. This time, we want to do error handling exclusively by passing values around. - Intentionally sacrifice short-term performance for the sake of getting to a working, reliable implementation. In the Rust compiler, we made some architectural choices - most significantly, the choice to minimize the number of compiler passes - which improved performance but made it harder to debug our implementation. For this rewrite, we want to keep in mind general performance considerations like the memory layouts of the data we're storing, but we generally want to prioritize having code that's easier to understand and to debug, even if that means more compiler passes or less compact memory representations than we'd prefer to have long-term. We can investigate things like combining passes and storing IR nodes using adjacency sometime after 0.1.0.
It's an exciting undertaking, and we're already enjoying it!