[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Who else is using SVN for large-binary-asset storage?

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Sat, 25 Apr 2020 09:18:36 +0000

Karl Fogel wrote on Fri, 24 Apr 2020 13:43 -0500:
> On 24 Apr 2020, Mark Phippard wrote:
> >I think this would be a good idea in that it might be one of the last
> >remaining niches where SVN is a better tool for the job than a DVCS.
> >I do not think I could contribute though.
> >
> >I just wanted to throw another item on the pile. I recall an old
> >thread (have not been able to find it) where it was shown that a
> >massive performance win on large binary blobs would be if we could
> >skip all of the xdelta stuff and just stream the binary.  If I recall
> >correctly, you can even see and demo this today using WebDAV and just
> >doing a PUT or whatever is right request with the entire file.  The
> >server already knows how to handle it and store the file the same as
> >it would if it had come via a SVN client.  I think there were some
> >complications with how svndiff0/svndiff1 etc are expected by a
> >client, but if there were some way to have a property on a file that
> >caused us to skip all of this, including storing the extra pristine
> >copy, it could be a big win for managing large binaries with SVN.
> >
> >It seems like we could make revert fetch the file from the server
> >again to restore a binary.
> >
> >If I can find any of those old threads I will share them.  So far the
> >only one I found was about how using a larger xdelta window size
> >could give better compression, but the thread I recall was about not
> >doing it at all.  It also assume that the xdelta is of no real value
> >because it does not shrink the amount of bytes that have to be
> >transferred.
>
> Ah, thanks for this reminder! I also recall those results (and I guess they're not surprising). I'll make sure we keep it in mind if this project happens. If you happen to dig up any of the old threads, that'd be great, but even if you don't, the above information is enough for a developer to know the possibility exists.

That one doesn't seem like it'd be terribly hard to implement. The
data format of svndiff0 enables "Produce the following bytes verbatim"
to be represented. There's nothing stopping whoever generates an
svndiff stream from using that feature of the data format to produce a
degenerate self-delta (that is, a self-delta that doesn't attempt to
compress) where currently it would produce a self-delta or a delta
against the BASE revision, as well as to produce an svndiff0 stream
even when the other side accepts svndiff1 and/or svndiff2. We don't
even need a new wire capability for this.

With this approach files would still be split into SVN_DELTA_WINDOW_SIZE
bytes -sized windows, so we won't reach the performance of sendfile(2);
however, I suspect the lion's share of the slowdown is due to the
deltification and compression steps.

Cheers,

Daniel

P.S. This being users@, clarification: "svndiff0" and "svndiff1" are
internal binary delta formats that have nothing whatsoever to do with
the «svn diff» command.
Received on 2020-04-25 11:18:55 CEST

This is an archived mail posted to the Subversion Users mailing list.