Mono Repo v. Multiple Repos

I've long been an opponent of Mono Repos. Generally I don't like monoliths in my code, either monolithic services or monolithic code bases. When given the power to make a decision on the way a company's codebase should be structured I chose multiple repos, with some objections being raised. Then 3 years later I sat down the team and told them it was time to create a mono repo. I've come to a definitive stance on this that is backed by practical expereince working in both types of setups and seeing where they work and where they fail. Where I've landed is a hard-and-fast rule which I think can be applied without exception, and a belief that in the question of of Mono Repo vs. Multiple Repos the answer is "It Depends".

The Rule

All of the Artifacts generated from a single Git Repository should deploy together.

Before I get into why let me be clear that this only applies to Git. Maybe Mercurial, but I have no expereince what that VCS. The same thing may apply to other VCSs but I also don't have experience with those, and there are certainly ways to use Git to make this rule not apply, but those are hard and change fundamental aspects of Git that should really cause it to be considered as a seperate VCS.

Mono Repos

Rodents Of Unusual Size? I don't think they exist.

Westly - The Princess Bride

Much like Westley and the R.O.U.S I don't think Mono Repos exist - I maintain this belief despite seemingly having been attacked by them on a number of occasions. You see when people talk about the benefits of the Mono Repo they often seem to forget the whole notion that their software is built on top of lots of other software. That includes compilers, build systems, runtimes, standard libraries, etc. We like to draw a line between First Party dependencies, ones that we control and third party dependencies, ones that are controlled by someone else, but in reality the line is often very blurry. Any organization that has 2 or more teams is already large enough to have a separation between code your team "owns" and code "owned" by other teams. In many cases this looks a lot more like a third party than a first party dependency. Additionally the prevalence of open source means that clearly third party dependencies can often be patched, blurring the line with first party dependencies.

The way I prefer to talk about repositories are along two axis:

The number of deployable artifacts that are generated
The ability to isolate deployments for those artifacts

So let's replace the term "Mono Repo" with "Multi-Artifact Repo" which is to say that the source code in the repository will be used to build multiple deployable artifacts. I want to be specific about the "deployable" word here since there are often artifacts like test reports that are not deployed, although if you follow the rule about deploying everything together then it's a moot point.

The Benefits of the Multi-Artifact Repository

You can largely find any article about the benefit of mono repos and drop their reasoning in here. The most often cited one is that a single change can span multiple artifacts. There are a lot of times when this actually is a really good thing. One thing we see a lot in the javascript / typescript eco system are these library families, or plugin based architectures. In these cases there are often a few advantages to the multi-artifact repository.

This is truly a first party dependency with a single team maintaining all of the parts
Changes to a core component often need changes to auxiliary components, while features in the auxiliary components often require changes to the core components
The interaction between the components is well understood

Within a multi-artifact repository it's easy to make changes that span multiple artifacts, ensure that the specific versions that are present together will work together, and there's a lower administration overhead since you have one set of permissions to hand out to contribute.

Version Skew

Where a multi-artifact repository breaks down is when you start to experience runtime version skew between your artifacts. The benefit of one change for multiple artifacts means the fundamentally you've introduced a strong coupling between the versions of the artifacts. People will say that semver can solve this problem, and if done correctly that's true. However it's hard to do semver correctly and it's quite a bit of extra work to test it correctly. When I'm working with the family of libraries type repositories my rule is to have a single version shared by every library and depend on that specific version of other libraries in the repository. What that means in practice is that a bug fix in an auxiliary component will trigger a patch release in all auxiliary and core components, even if there is no change. However having empty release is often preferrable to everything breaking because of version skew.

But what happens if your core component adds a dependency on some new experimental feature, but doesn't make a breaking change to it's API. Should we require consumers of our auxilary components to accept this change just to get new features. The short answer is maybe - frankly most modern dependency managment systems have ways to override which version of a library is resolved. That means that you could always let the consumers take on the risk of using a legacy core component. There are also ways to manage this new scary dependency on your side. Maybe it's optional, maybe there are 2 varients of your core component, or maybe you have a policy to back-port features to a previous major version. The fact is that changes like this are more rare than changes where you may get semver wrong. Each consumer of your library is going to have more context on their upgrade path than you will so let them deal with that in the way that makes the most sense to them. Your job as the maintainer of this family of libraries is to ensure that they work together the way they were intended to.

Services - version skew as a way of life

Services are not like libraries for better and worse. To fully realize the benefits of a service based / microservice deployment every service needs to have a large degree of independence, this usually means private data storage, seperate scaling and load balancing, and most importantly seperate deployment. By allowing each service to deploy independently you lower the cost — and with proper processes — the risk of each deployemnt. Because of this deployments can go faster and be more frequent leading to more productivity. If you want all these benefits you need to isolate each service in it's own repository. I equate multiple microservices in one repository to microservices sharing a database. You're simply going to have to give up some benefits.

Advocates of the approach of putting multiple services in one repository point out how you can see what's deployed to each service in one place. The fact is that it is rarely the whole story. Let's assume for the sake of argument that all my services had continous delivery, and fast deployment pipelines, no manual signoffs were required. In this world if I changed a service that produced a message and the consumer that consumed it and push those changes then within a few minutes everyone is working happily. What happens during the deployment though?

The consumer deployed before the producer - That newly required field isn't there yet, lots of errors pop-up.
The producer deployed before the consumer - That previously required field that we dropped is still required, lots of errors pop-up.

Even if the stars align and both services get deployed perfectly in sync then all it takes is one bug to trigger a rollback and all of a sudden we're back to version skew. This is all to say that while you can know what version of every service is live in your system you have to understand that the answer will often be fluid, and sometimes more than one version will be live. It is almost impossible to know definitively which versions of a service your code will end up depending on, or which versions of services will end up dependeing on your code.

This isn't an insurmountable burden it just means that when you're making changes you need to be aware of the fact. You build resiliency into services, you make sure that dependencies have deployed features and stabilized before building on them. You don't introduce breaking changes. The easiest and most general practice in this sense is the 3 phase deployment.

Make a change to the dependency to expose the new and old versions of a feature
Update each consumer to use the new version of the feature
Remove the old version of the feature

If you move your code into a single repository you still need to do this.

Explicit Boundaries

Because it is harder to make a change in two git repos at exactly the same time this introduces explicit boundaires. If these boundaries are lined up with deployment artifacts that are deployed seperately then the boundaires are good as they help people to think about the implications of making changes across them. You don't get to pretend that you can make a breaking change to the producer, and just update the consumer at the same time.

That being said there are times when you may find that every new feature seems to require you to make changes in multiple repositories. The answer is not to have one repository with 2 deployable artifacts but to have 1 deployable artifact, or otherwise link the deployment of both artifacts together. This was the case that lead me to tell the team to create a mono repo. We had a set of libraries that didn't have good boundaries between them. By moving into a single repoository and tying all the versions of the libraries together we could allow for sweeping changes across them.

What About Internal Libraries?

Developers often try to make distinctions between Internal (first party) and External (third party) libraries. I think this distinction is harmful, and if you do away with it then lots of justification for multiple deployments per repository go away too.

The difference in practice between a library depedency and a service dependency is that you always get whatever the latest version of the service that is out there, even if you don't take any action. In the case of a library dependency you get to pick and choose which version you get. That may include using wildly out-of-date versions, either intentionally or though innaction. The advantage of treating all dependencies the same is that while your first party dependencies may push out must-have features a lot more regularly than your thrid party ones the same tools can work for both. This means that the practices you put in place to keep up with the latest changes to your core library also help you get those secruity updates to your ssl library and so on.

I've had people complain that it's much harder to push changes to a library out to all consumers when they're broken up into multiple repositories. I always wondered about this. I imagine a change to some shared library and, as advocated, changes to all consumers, being made in a single commit. In order to move anyone of those consumers forward you need to move them all. In fact whenever I've seen library changes in people working in a mono repo they tend to be non-breaking changes. Only one consumer is getting updated and the rest may follow if we ever get around to it. Modern dependency management systems will let you link the local version of your library into the one consumer that needs the change, do your development entirely locally without worrying about other consumers It may take an extra minute to handle deployment but you don't need to worry about keeping things backwards compatible. It may also lead to the realization that like most companies core libraries yours is probably bloated. If you're building features for single consumers then perhaps the functionality should live with those services.

I think the most egregious complaint about multiple repositories is that it's hard / impossible to find all the places that your service is used. Full text code search has been a thing for over a decade at this point. There's no reason why you should be struggling to find places where your code is used. It may be the case that "Find All References" doesn't work in your editor but this is not the only way to look for references.

My Advice

If I was building a microservice based architecutre tomorrow I'd suggest starting with multiple respositories. I'd prrobably also suggest starting with a single service. Drawing lines around services can be hard, and it can often be much easier to split a service up then put several back together. Invest in the tools to help you manage 3rd party and first party dependencies now including things like a full-text code search. Because you're almost certainly going to have third party dependencies there is benefit for these tools even if you don't end up with multiple services.