Can anyone speak to usecases for submodules that arent better served by your language’s package manager? Multi-language codebases, languages without appropriate package management perhaps?
Submodules are great for projects where your code depends on upstream Git repos that you don't control and don't want to vendor yourself.
I recently did an embedded Linux design that depended on 5 external repositories: one from Yocto, three from OpenEmbedded, and one from a CPU vendor. My own code just sat on the top of this set of repos.
Submodules made that design very simple. One repo with all of my code in it, and submodules for all external repos. All dependent repos were pinned before of how submodules work. Pins were easy to update when desired, and never move on their own.
Isn't that because you didn't have a good package manager to handle these? If you have a package manager that allows you to add packages from Git repos, why would you use submodules?
We use a submodule in https://github.com/uber/h3-py to wrap the core H3 library, which is written in C. Submodules seemed like a reasonable way to handle the dependency, and, at least for this use case, the approach hasn't given me any problems.
The latter had been an issue for me in the past with some projects that just weren't packaged for, e. G., python and had to be imported directly. It can also be helpful for non-packaged assets that are held in a separate git repository.
This is where I use them. I have some Rust bindings to C++ code, and that C++ code lives in my repo as a submodule. Everyone seems to hate submodules I guess because of the surprising behavior described in the post, but for my use case they've been completely fine.
Two of the largest tech companies in the world, Google and Meta, had to roll custom VCS for their day to day engineering operations because git and git submodule were so unsuitable. The default pack file behavior of git is completely unsuitable for a rapidly releasing company with a monorepo. You don’t want or need the entire history — you just want a few recent commits. You do want some visibility into what your coworkers are up to so you can prevent merge conflicts before they happen (centralization is good!). You probably only need part of the tree, not the entire thing.
If you go back and watch Linus’ talk at Google regarding git, he’s basically describing (unknowingly) why Google needs to not use git for its day to day. Even on a smaller scale, Android (AOSP) had to create a meta tool for git called git-repo to handle its source tree. Git submodule failed there.
> Two of the largest tech companies in the world, Google and Meta, had to roll custom VCS for their day to day engineering operations because git and git submodule were so unsuitable.
Where did you get that from? Sources?
Google rolled their own VCS, because Google is older than Git, and they needed something that works. Their custom VCS is a hacked up version of Perforce.
By the time Git came around, Google was already pretty much committed to their in-house custom tool, too many things relied on it.
Git submodules aren't really intended for that use-case in the first place. They're not really intended to model a mono-repo at all, more a relationship between repositories that have their own histories.
The main thing that has been developed in git to allow very large repos is shallow clones (both in terms of history and slices of the repo). This model works well enough within git's logic, but it's just historically not been focused on until fairly recently (and I don't really know what the state of play is there - I think there's still a limit at a certain scale where simply finding the state of play of a large checkout becomes a bottleneck, and you start to want a persistent daemon to use FS notifications to keep track of what's changed instead of stat()ing every file in the tree)
(I've often pondered if it would be possible to make a DVCS where there's no firm repo boundary at all, i.e. you could construct a checkout from any combination of trees and commits stored in different locations, and have it work seamlessly. There's probably more than a few thorny issues in there, but it would be an interesting concept)
Every "language package manager" with a lock file format and requirements file, is an inferior, ad hoc, formally-specified, error-prone, incompatible reimplementation of half of Git.
Almost every use case for a package manager is better served by Git, whether you choose to use submodules or not. If you want to do version control, use the version control system, and stop trying to do an end-run around the way it works.
Previously:
> I'm happy to criticize NPM the tool. The whole thing is designed as a second, crummier version control system that lives in disharmony with and on top of your base-level version control system (so it can subvert it). It's a terrible design.
Git only replaces the lock file aspect of package managers, not the version requirement resolution part. (Or the part that tells deals with eg Rust's feature selection, or different build instructions for different operation systems or versions of the language etc.)
> Git only replaces the lock file aspect of package managers
Nope, Git is pretty good about downloading stuff over the network, too. In fact, it's so good at it that many people using a language package manager insist you use Git at some point even when (before) using the package managers. Indeed, there's been a lot of trepidation and gnashing of teeth about whether the places where language package managers download packages from are as reliable/trustworthy as the server where the Git repo for the software project is hosted.
> nothing about eg semantic versioning, or how to resolve different requirements from different libraries
"[…] incompatible reimplementation of _half_ of Git."
You can use git subtree to convert between mono-repo and separate repos, without losing your history. You can even keep both styles up to date concurrently.