Edit: it seems like my explanation turned out to be too confusing. In simple terms, my topology would look something like this:
I would have a reverse proxy hosted in front of multiple instances of git servers (let’s take 5 for now). When a client performs an action, like pulling a repo/pushing to a repo, it would go through the reverse proxy and to one of the 5 instances. The changes would then be synced from that instance to the rest, achieving a highly available architecture.
Basically, I want a highly available git server. Is this possible?
I have been reading GitHub’s blog on Spokes, their distributed system for Git. It’s a great idea except I can’t find where I can pull and self-host it from.
Any ideas on how I can run a distributed cluster of Git servers? I’d like to run it in 3+ VMs + a VPS in the cloud so if something dies I still have a git server running somewhere to pull from.
Thanks
Well it’s a tougher question to answer when it’s an active-active config rather than a master slave config because the former would need minimum latency possible as requests are bounced all over the place. For the latter, I’ll probably set up to pull every 5 minutes, so 5 minutes of latency (assuming someone doesn’t try to push right when the master node is going down).
I don’t think the likes of Github work on a master-slave configuration. They’re probably on the active-active side of things for performance. I’m surprised I couldn’t find anything on this from Codeberg though, you’d think they have already solved this problem and might have published something. Maybe I missed it.
I didn’t find anything in the official git book either, which one do you recommend?
Are you familiar with git hooks? See
https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks
Scroll to the part about server side hooks. The idea is to automatically propagate updates when you receive them. So git-level replication instead of rsync.