Monorepo - Our experience

October 29, 2024
manav@ente.io

Nine months ago, we switched to a monorepo. Here I describe our experience with the switch so far.

This is not meant as a prescriptive recommendation, but is rather meant as an anecdotal exposition, in the hope that it might helps other teams make informed decisions.

Unlike most forks in the road, we've travelled both ones. So first I will describe the history that lead up to the change, outlining how we've already experienced the alternative non-monorepo setup too in a similar context, and thus are now well positioned to compare apples to apples.

Platforms and monorepos

Ente began its life half a decade ago. It was a meant as a end-to-end encrypted platform for storing all of Vishnu's personal data, but two things happened: Vishnu realized it was not just him that needed such a thing to exist, and he realized it was going to be a lot of work to build his vision.

So he became a we, and instead of tackling all personal data, the focus was shifted to a singular aspect of it, Ente Photos, to get the spaceship off the ground. To an external observer what looks like a photos app (and that indeed is our concrete current goal) is driven by an underlying vision of the human right to the privacy of all forms of personal data.

Why do I describe all this? Because when viewed in light of this vision, Ente isn't a single app, it is a platform, and storing its code as a monorepo is the ideologically appropriate choice.

This is similar to, say, the Linux kernel. Most people don't realize that the biggest open source project in the world, by most metrics imaginable, the Linux kernel itself, is a monorepo. Even though it is called a kernel, ideologically it really is the full platform, device drivers and all, and the code organization as a monorepo reflects that.

Staying close to the vision of Ente as a platform is not only about the a ideology, but it has practical offshoots too.

For example, a few years ago, we realized that there was no good open source end-to-end encrypted OTP app with cloud backups. So we built one, for our own use, because it was rather easy to build it on top of the primitives we had already created for the photos app.

Today, this side project is the #1 OTP app in the world with the aforementioned characteristics. This might seem like a happy accident, but it isn't, this was always the plan: build a solid platform, then one by one tackle the various bespoke apps we'll need to best handle different forms of data.

Microrepos

So ideologically Ente is best kept as a monorepo. But it wasn't one to start with, due to various historical factors in how the product evolved. What was a hardware device transitioned into software. The server component was closed source until we had the bandwidth to get it audited. Weekend projects like Auth outgrew their reach. Etc.

Let us rewind the tape back to, say, 2 years ago (just to pick a roughly symmetrical split). While we have grown since then in all product aspects including number of developers, we are extremely cautious in adding engineering headcount, so the number of developers hasn't grown that much. Thus it is a similar number of developers working on the same number of products (Ente Photos, Ente Auth) multiplied by the same number of platforms (mobile, web, desktop, server, CLI).

2 years ago, these codebases were spread across a dozen or so repositories.

In February we decided to take time out to finish the task for open sourcing the server side. This was a natural point to also rein in the proliferation of codebases, and we took this as a chance to move to a monorepo.

So, as a similar sized team doing similar work, we've experienced an ~year with a split microrepo setup, and an ~year with the alternative combined monorepo setup.

Summary

If I had to summarize the difference: Moving to a monorepo didn't change much, and what minor changes it made have been positive.

This is not coming as a surprise to us. Most of us didn't care strongly about our repository organization, and overall we weren't expecting much from changing it either. The general vibe was a monorepo might be better, and so why not, and since none of us opposed the choice, we went ahead, but we weren't trying to "solve" anything by the change. We were already happy with our development velocity.

And indeed, overall it hasn't changed much. We're still happy with our development velocity, so it did not get in our way. There have been many small wins however, so for the rest of this post I'll delve deeper into them.

Less grunt work

This is the biggest practical win. There is much less grunt work we have to do.

As an example, take the following pull request. It changed the ML model that is used for computing on-device face embeddings.

Screenshot of the GitHub view of a pull request that changed multiple subsystems in Ente's repository

This change affected (1) the photos mobile app, (2) the photos desktop app, (3) the photos web app, and (4) the ML scaffolding code itself.

In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.

Now, it is a single one. Easy to review, easy to merge, easy to revert.

Less submodules

Submodules are an irritating solution to a real problem. The problem is real, so a solution is welcome, and submodules are indeed an apppropriate solution, but they're irritating nonetheless.

All this is to say, we appreciate the existence of git submodules as a way to solve practical code organization problems, but we wish we didn't need to use them.

Monorepos reduce the number of places where a submodule would otherwise be required, and is thus a win.

As an example, previously the web and desktop codebases for the Ente Photos app had a submodule relationship. This required a PR dance each time a release had to be made or some other important change pushed to main. All that's gone now. These two interdependent pieces of code now directly refer to each other, and changes can be made to them atomically in the same commit.

More stars

This is the biggest marketing win. Previously our stars were spread out across the dozen or so repositories. If each had a thousand stars, we'd still have 12k stars in total, but because of the way both human psychology and GitHub's recommendation algorithms work, it'd come off as less impactful than a single repository with 12k stars.

Easy

One of the concerns we had going into this was that this might impact our development velocity. We thought we'll have to invent various schemes and conventions to avoid stepping on each other's toes.

Those concerns turned out to be unfounded. We didn't invent anything, waiting to see if the need arose, and it never did. So for an individual engineer in their day to day work, the move has been easy since we didn't ask anyone in the team to change their workflows in any way.

There still are no "repository wide" guidelines, except two:

  1. There should not be any repository wide guidelines
  2. Don't touch the root folder

That's it. Within each folder, or subteam of ourselves, we are otherwise free to come up with whatever organization or coding conventions or what not.

I do realize that maybe the ease for us was a function of both the relatively small size of our team, and the amount of trust we have in each others' competence, and both these factors might not be replicable in other teams.

Long term refactoring

Refactoring across repository boundaries requires much more activation energy as compared to spotting and performing gradual refactorings across folder boundaries. Technically it is the same, but the psychological barriers are different.

As an example, we've already merged together many of our disparate web apps into a similar setup, without needing to make elaborate upfront plans. It happened easily and naturally, since we could see all of them "next to each other" and the opportunities for code reuse become obviously apparent.

Connectedness

This way of "working in a shared space without working in the same folder" has lead to us feeling more connected to each other's work as compared to when, individually or as subteams, we were all committing to separate repositories.

Previously, it was easy to get lost in one's work (in a good way), but sometimes it lead to the feeling of working on a small part without being able to see the whole (in a not so good way).

Now, one can still remain lost in one's own work in the universe of one's own "folder", so that part of the goodness remains. But there are now also additional subtle cues that let us see how what we are doing is part of a interconnected whole. So it's a win win.

What I described might be too abstract, so let me give an example. Everytime I do a git pull, I get to see all the changes that my team mates have been working on. The names of the recently changed files. The number of changes in them. The names of the recent branches. The tags that were recently pushed. All of these individually are very low bit, and imprecise, information vectors, and I don't even consciously look at them.

But what I've found over time that, subconsciously and automatically, these "environmental cues" give me a great sense of "all that is happening around". What features are being worked on, what stage of completion they are at, what bugfixes were pushed, what releases were recently made.

Similar serendipitious information exchange happens when I, say, open the pull requests page and without even intending to, I glance at the stuff others are up to.

The best part is, all of this is subverbal and effortless. Everybody just does their thing, and just by virtue of doing them all in the same shared digital space, arises a sense of awareness and connectedness.

Wrapping up

This is already too long, much longer than I intended to write, so let me stop now.

I could offer tips, but I don't think there is any secret technical sauce that is needed. One thing that had bothered me before the move was how will we manage our GitHub workflows, but that turned out to be trivial since we can scope GitHub workflows to only run on changes to a specific folder.

An engineering-mindset retrospective document would be incomplete without both a Pros and Cons section, but we haven't really found any cons that have effected us so far, so excuse that exclusion.

On a personal level, what I've liked most about the move to our monorepo is the feeling of being part of a juggernaut that is relentlessly rising towards perfection, and has attained an unstoppable momentum. The code I'm writing is not an isolated web component or a goroutine or a little documentation fix, it is now part of this singular platform that will outlive me.