On Sat, Apr 25, 2020 at 11:18 AM Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> Karl Fogel wrote on Fri, 24 Apr 2020 13:43 -0500:
> > On 24 Apr 2020, Mark Phippard wrote:
> > >I think this would be a good idea in that it might be one of the last
> > >remaining niches where SVN is a better tool for the job than a DVCS.
> > >I do not think I could contribute though.
> > >
> > >I just wanted to throw another item on the pile. I recall an old
> > >thread (have not been able to find it) where it was shown that a
> > >massive performance win on large binary blobs would be if we could
> > >skip all of the xdelta stuff and just stream the binary. If I recall
> > >correctly, you can even see and demo this today using WebDAV and just
> > >doing a PUT or whatever is right request with the entire file. The
> > >server already knows how to handle it and store the file the same as
> > >it would if it had come via a SVN client. I think there were some
> > >complications with how svndiff0/svndiff1 etc are expected by a
> > >client, but if there were some way to have a property on a file that
> > >caused us to skip all of this, including storing the extra pristine
> > >copy, it could be a big win for managing large binaries with SVN.
> > >
> > >It seems like we could make revert fetch the file from the server
> > >again to restore a binary.
> > >
> > >If I can find any of those old threads I will share them. So far the
> > >only one I found was about how using a larger xdelta window size
> > >could give better compression, but the thread I recall was about not
> > >doing it at all. It also assume that the xdelta is of no real value
> > >because it does not shrink the amount of bytes that have to be
> > >transferred.
> > Ah, thanks for this reminder! I also recall those results (and I guess they're not surprising). I'll make sure we keep it in mind if this project happens. If you happen to dig up any of the old threads, that'd be great, but even if you don't, the above information is enough for a developer to know the possibility exists.
> That one doesn't seem like it'd be terribly hard to implement. The
> data format of svndiff0 enables "Produce the following bytes verbatim"
> to be represented. There's nothing stopping whoever generates an
> svndiff stream from using that feature of the data format to produce a
> degenerate self-delta (that is, a self-delta that doesn't attempt to
> compress) where currently it would produce a self-delta or a delta
> against the BASE revision, as well as to produce an svndiff0 stream
> even when the other side accepts svndiff1 and/or svndiff2. We don't
> even need a new wire capability for this.
> With this approach files would still be split into SVN_DELTA_WINDOW_SIZE
> bytes -sized windows, so we won't reach the performance of sendfile(2);
> however, I suspect the lion's share of the slowdown is due to the
> deltification and compression steps.
> P.S. This being users@, clarification: "svndiff0" and "svndiff1" are
> internal binary delta formats that have nothing whatsoever to do with
> the «svn diff» command.
I think Mark was referring to this thread on dev@ from 2017, which was
started by Paul Hammant (I think he was working on a tool for
versioning big directory trees easily, with merkle trees etc ... might
be interesting to get in touch with him):
Philip Martin made some interesting suggestions and provided some
numbers, first focusing on the deltification overhead (which he could
eliminate, on the client-side, by enabling SVNAutoversioning and
performing a PUT with curl -- IIUC it's not possible right now to
eliminate deltification on the server-side):
and later he also eliminated compression on the server-side, which
yielded another factor 3 speed boost:
Received on 2020-04-25 19:54:48 CEST