Multi to Mono Repository
How Ravelin moved from a multi-repository to a mono-repository
Ravelin moved from a multi-repository set-up to a mono-repository.
We run a microservices architecture, originally with a separate Git repository for each service. The implicit dependencies between different versions of different services were not expressed anywhere, which led to various problems in building, continuous integration, and, notably, repeatable builds.
As a first step in our journey to improve the stability, predictability and reliability of our build system, we merged all the different services into one repository, called core.
To this end, we created and open-sourced a program which merges multiple separate repositories into one big repository, each into their own subdirectory:
- Preserving history; we often find ourselves using the git blame tool to discover why a certain change was made.
- Preserving commit hashes; we use commit hashes in binary names and our issue tracker; ideally, these references remain intact.
- A transition period rather than Stop-The-World migration; we want to merge in a few repositories per day, with minimal disruption to work-flow.
- Seamless transitions; changes made to the old repositories after they were migrated must be imported to the new monorepository.
Before we dive into the details of the actual migration, let’s discuss the theory behind it.
Imagine two repositories, orange and green:
Orange has two branches: A and B. Green has two branches: B and C.
We want to merge the two repositories into one final repository, “black”, with three branches: A, B, and C:
The files from orange will all be moved to a subdirectory, /orange, and the same happens for green. This avoids conflicts between similarly named files in the two different repositories.
The branches will be merged by name: a branch in the final repository (black) is made up of all the branches by that name in the original repositories. This means that A will just contain files from orange (all in one directory: /orange), and C will just contain files from green (all in /green). However, B will contain both orangeand green. Its root will have two directories: /orange and /green.
An example with real repositories so you can follow along at home:
As you can see, in the branch master, we have the masters of the original repositories, each in their own subdirectory.
This git log command shows the last repository (µ) being merged into master:
Growing The MonoRepo
With the original three branches under our belt, we feel confident in adding a fourth repository. This can be done after work has already been done on the monorepo; there is no need to do the complete migration at once.
Prepare a fourth repository:
Import it into our monorepo:
While the merging is happening, coworkers are happily coding away and adding more commits. We want to merge them in as they go, without interrupting their work-flow:
The tomono.sh program adds a remote for each repository. To pull in the changes, first fetch the remotes:
Note that at this point, the remote was only fetched, so master has not yet been updated to include the new commit.
At this point, the changes can be merged in:
This seems to have worked — -but look carefully at the output of ls: the filemonde, which was added to the original repository’s root, is now in our monorepository’s root.
This is a consequence of using the git default merge strategy and options.
Git has a built-in solution to this problem: subtree merging. This is an option passed to the recursive merge strategy:
An alternative is the revered git merge -s subtree, but we found this will get confused when updating files in similar subtrees. Especially for common files such as circle.yml or .gitignore, which are present in every directory.
Conclusion and further work
Merging the many separate repositories into one is only the start of the journey. The Continuous Integration must be adapted to deal with the new service-per-subdirectory structure, something which requires further git scripting and tooling.