[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Who else is using SVN for large-binary-asset storage?

From: Mark Phippard <markphip_at_gmail.com>
Date: Fri, 24 Apr 2020 13:33:12 -0400

On Fri, Apr 24, 2020 at 12:42 PM Karl Fogel <kfogel_at_opentechstrategies.com>
wrote:

> Are there other companies out there using SVN for large-binary-blob
> storage?
>
> I'm wondering if it might be possible to put together a mini-consortium of
> companies to fund the completion of Issue #525:
>
> https://issues.apache.org/jira/browse/SVN-525
> "allow working copies without text-base/"
>
> Our company keeps medium-large (say, ~5GB) binary blobs under version
> control in a dedicated Subversion repository, and it works quite well.
> Subversion can handle files of that size just fine, and it enables us to do
> path-based authorization (yay) and partial-tree checkouts. [1]
>
> But the presence of text-base files in .svn/pristine/ is a real downer
> :-). The text-base files are mostly pointless in this case, and they
> double the client-side disk space usage.
>
> There is no useful diff between two revisions of these binary blobs:
> there's no human-readable diff *and* there's rarely any machine-useable
> diff either (e.g., for reducing network time when receiving an update or
> committing a new revision). So the only benefit from the text-base files
> is to make 'svn revert' faster. We'd be happy to have 'svn revert'
> re-fetch the file from the repository if it meant we could reduce our
> storage cost on the client side by half.
>
> (Plus you'd only lose local-revert capability on files where you know you
> don't need it, since presumably the no-text-base behavior would be optional
> per file and controlled via an 'svn:no-pristine' property or something like
> that.)
>
> Is anyone else in a similar situation? If we join forces, we could
> probably fund one or more Subversion developers to finally get Issue #525
> done. I'd be happy to do the organizing (I'm still reasonably familiar
> with Subversion development and who does what, though I haven't been an
> active developer in the project in many years).
>
> Please CC me on any replies, as I'm not subscribed to users@.
>
> Best regards,
> -Karl
>
> [1] We investigated using Git too, but, though Git good for many things,
> it is not well-suited for this particular job. The Git Large-File Storage
> extension (https://git-lfs.github.com/) doesn't address most of our needs
> either; it's solving a different problem, I guess.
>

I think this would be a good idea in that it might be one of the last
remaining niches where SVN is a better tool for the job than a DVCS. I do
not think I could contribute though.

I just wanted to throw another item on the pile. I recall an old thread
(have not been able to find it) where it was shown that a massive
performance win on large binary blobs would be if we could skip all of the
xdelta stuff and just stream the binary. If I recall correctly, you can
even see and demo this today using WebDAV and just doing a PUT or whatever
is right request with the entire file. The server already knows how to
handle it and store the file the same as it would if it had come via a SVN
client. I think there were some complications with how svndiff0/svndiff1
etc are expected by a client, but if there were some way to have a property
on a file that caused us to skip all of this, including storing the extra
pristine copy, it could be a big win for managing large binaries with SVN.

It seems like we could make revert fetch the file from the server again to
restore a binary.

If I can find any of those old threads I will share them. So far the only
one I found was about how using a larger xdelta window size could give
better compression, but the thread I recall was about not doing it at all.
It also assume that the xdelta is of no real value because it does not
shrink the amount of bytes that have to be transferred.

-- 
Thanks
Mark Phippard
Received on 2020-04-24 19:33:30 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.