Download of 6GB is wild, is that re-downloading the entire package for each one that needs an update? Shouldn’t it be more efficient to download only the changes and patch the existing files?
At this point it seems like my desktop Linux install needs as much space and bandwidth than windows does.
Shouldn’t it be more efficient to download only the changes and patch the existing files?
As people mentioned, that becomes problematic with a distro like arch. You could easily be jumping 5-6 versions with an update, with some more busy packages and updating less frequently. This means you need to go through the diffs in order, and you need to actually keep those diffs available.
This actually poses two issues, and the first one is that software usually isn’t built for this kind of binary stability - anything compiled/autogenerated might change a lot with a small source change, and even just compressing data files will mess it up. Because of that, a diff/delta might end up not saving much space, and going through multiple of them could end up bigger than just a direct download of the files.
And the second issue is, mirrors - mirrors need to store and provide a lot of data, and they’re not controlled by the distribution. Presumably to save on space, they quickly remove older package versions - and when I say older, I mean potentially less than a week old. In order for diffs/deltas to work, you’d need the mirrors to not only store the full package files they already do (for any new installs), but now also store deltas for N days back, and they’d only be useful to people who update more often than every N days.
This doesn’t work too well for rolling releases, because users will quickly get several version jumps behind.
For example, let’s say libbanana is currently at version 1.2.1, but then releases 1.2.2, which you ship as a distro right away, but then a few days later, they’ve already released 1.2.3, which you ship, too.
Now Agnes comes home at the weekend and runs package updates on her system, which is still on libbanana v1.2.1. At that point, she would need the diffs 1.2.1→1.2.2 and then 1.2.2→1.2.3 separately, which may have overlaps in which files changed.
In principle, you could additionally provide the diff 1.2.1→1.2.3, but if Greg updates only every other weekend, and libbanana celebrates the 1.3.0 release by then, then you will also need the diffs 1.2.1→1.3.0, 1.2.2→1.3.0 and 1.2.3→1.3.0. So, this strategy quickly explodes with the number of different diffs you might need.
At that point, just not bothering with diffs and making users always download the new package version in full is generally preferred.
Hmm, good question. I know of one such implementation, which is Delta RPM, which works the way I described it.
But I’m not sure, if they just designed it to fit into the current architecture, where all their mirrors and such were set up to deal with package files.
I could imagine that doing it rsync-style would be really terrible for server load, since you can’t really cache things at that point…
That reminds me of Chaotic AUR, though it’s an online public repo. It automatically builds popular AUR packages and lets you download the binaries.
It sometimes builds against outdated libraries/dependencies though, so for pre-release software I’ve sometimes had to download and compile it locally still. Also you can’t make any patches or move to an old commit, like you can with normal AUR packages.
I’ve found it’s better to use Arch Linux’s official packages when I can, though, since they always publish binaries built with the same latest-release dependencies. I haven’t had dependency version issues with that, as long as I’ve avoided partial upgrades.
Download of 6GB is wild, is that re-downloading the entire package for each one that needs an update? Shouldn’t it be more efficient to download only the changes and patch the existing files?
At this point it seems like my desktop Linux install needs as much space and bandwidth than windows does.
As people mentioned, that becomes problematic with a distro like arch. You could easily be jumping 5-6 versions with an update, with some more busy packages and updating less frequently. This means you need to go through the diffs in order, and you need to actually keep those diffs available.
This actually poses two issues, and the first one is that software usually isn’t built for this kind of binary stability - anything compiled/autogenerated might change a lot with a small source change, and even just compressing data files will mess it up. Because of that, a diff/delta might end up not saving much space, and going through multiple of them could end up bigger than just a direct download of the files.
And the second issue is, mirrors - mirrors need to store and provide a lot of data, and they’re not controlled by the distribution. Presumably to save on space, they quickly remove older package versions - and when I say older, I mean potentially less than a week old. In order for diffs/deltas to work, you’d need the mirrors to not only store the full package files they already do (for any new installs), but now also store deltas for N days back, and they’d only be useful to people who update more often than every N days.
This doesn’t work too well for rolling releases, because users will quickly get several version jumps behind.
For example, let’s say libbanana is currently at version 1.2.1, but then releases 1.2.2, which you ship as a distro right away, but then a few days later, they’ve already released 1.2.3, which you ship, too.
Now Agnes comes home at the weekend and runs package updates on her system, which is still on libbanana v1.2.1. At that point, she would need the diffs 1.2.1→1.2.2 and then 1.2.2→1.2.3 separately, which may have overlaps in which files changed.
In principle, you could additionally provide the diff 1.2.1→1.2.3, but if Greg updates only every other weekend, and libbanana celebrates the 1.3.0 release by then, then you will also need the diffs 1.2.1→1.3.0, 1.2.2→1.3.0 and 1.2.3→1.3.0. So, this strategy quickly explodes with the number of different diffs you might need.
At that point, just not bothering with diffs and making users always download the new package version in full is generally preferred.
Interesting, it wouldn’t work like rsync where it compares the new files to the old ones and transfers the parts that have changed?
Hmm, good question. I know of one such implementation, which is Delta RPM, which works the way I described it.
But I’m not sure, if they just designed it to fit into the current architecture, where all their mirrors and such were set up to deal with package files.
I could imagine that doing it rsync-style would be really terrible for server load, since you can’t really cache things at that point…
Yeah I guess these days the majority of users have fast enough connections that its not worth it. It sucks if you have crappy internet though hah.
No, that’s not how compiling works. And yes, 6GB is wild. If I don’t patch in a month, the download might be 2GB and the net will still be smaller.
I don’t think I could get close to my Windows installation even if I installed literally every single package…
Patching means rebuilding. And packagers don’t really publish diffs. So it’s use all your bandwidth instead!
With stuff like rsync, diffs can be calculated on the fly. But it requires way more server cpu than just chucking files onto the network.
Which is WAY more economical.
Rebuilding packages takes a lot of compute. Downloading mostly requires just flashing some very small lights very quickly.
If you have multiple computers, you can always set up a caching proxy so you only have to download the packages once.
That reminds me of Chaotic AUR, though it’s an online public repo. It automatically builds popular AUR packages and lets you download the binaries.
It sometimes builds against outdated libraries/dependencies though, so for pre-release software I’ve sometimes had to download and compile it locally still. Also you can’t make any patches or move to an old commit, like you can with normal AUR packages.
I’ve found it’s better to use Arch Linux’s official packages when I can, though, since they always publish binaries built with the same latest-release dependencies. I haven’t had dependency version issues with that, as long as I’ve avoided partial upgrades.
Yeah, totally is. There’s a reason nobody publishes diffs
openSUSE Leap does have differential package updates. Pretty sure, I once saw it on one of the Red-Hat-likes, too.
But yeah, it makes most sense on slow-moving, versioned releases with corporate backing.
Ooh, got any links on how Leap does this? My searching isn’t yielding much
Had to search for a bit, too, but finally found the relevant keyword: Delta RPMs
(Which also explains why it’s a Red Hat / SUSE thing. 😅)
Here’s a decent article, which links to some more in-depth explanations: https://www.certdepot.net/rhel7-get-started-delta-rpms/
Thanks, really well hidden feature here for someone I’m sure
I don’t think anyone actually uses these