On Sat, Apr 25, 2020 at 1:54 PM Johan Corveleyn <jcorvel_at_gmail.com> wrote:
> On Sat, Apr 25, 2020 at 11:18 AM Daniel Shahaf <d.s_at_daniel.shahaf.name>
> > Karl Fogel wrote on Fri, 24 Apr 2020 13:43 -0500:
> > > On 24 Apr 2020, Mark Phippard wrote:
> > > >I think this would be a good idea in that it might be one of the last
> > > >remaining niches where SVN is a better tool for the job than a DVCS.
> > > >I do not think I could contribute though.
> > > >
> > > >I just wanted to throw another item on the pile. I recall an old
> > > >thread (have not been able to find it) where it was shown that a
> > > >massive performance win on large binary blobs would be if we could
> > > >skip all of the xdelta stuff and just stream the binary. If I recall
> > > >correctly, you can even see and demo this today using WebDAV and just
> > > >doing a PUT or whatever is right request with the entire file. The
> > > >server already knows how to handle it and store the file the same as
> > > >it would if it had come via a SVN client. I think there were some
> > > >complications with how svndiff0/svndiff1 etc are expected by a
> > > >client, but if there were some way to have a property on a file that
> > > >caused us to skip all of this, including storing the extra pristine
> > > >copy, it could be a big win for managing large binaries with SVN.
> > > >
> > > >It seems like we could make revert fetch the file from the server
> > > >again to restore a binary.
> > > >
> > > >If I can find any of those old threads I will share them. So far the
> > > >only one I found was about how using a larger xdelta window size
> > > >could give better compression, but the thread I recall was about not
> > > >doing it at all. It also assume that the xdelta is of no real value
> > > >because it does not shrink the amount of bytes that have to be
> > > >transferred.
> > >
> > > Ah, thanks for this reminder! I also recall those results (and I
> guess they're not surprising). I'll make sure we keep it in mind if this
> project happens. If you happen to dig up any of the old threads, that'd be
> great, but even if you don't, the above information is enough for a
> developer to know the possibility exists.
> > That one doesn't seem like it'd be terribly hard to implement. The
> > data format of svndiff0 enables "Produce the following bytes verbatim"
> > to be represented. There's nothing stopping whoever generates an
> > svndiff stream from using that feature of the data format to produce a
> > degenerate self-delta (that is, a self-delta that doesn't attempt to
> > compress) where currently it would produce a self-delta or a delta
> > against the BASE revision, as well as to produce an svndiff0 stream
> > even when the other side accepts svndiff1 and/or svndiff2. We don't
> > even need a new wire capability for this.
> > With this approach files would still be split into SVN_DELTA_WINDOW_SIZE
> > bytes -sized windows, so we won't reach the performance of sendfile(2);
> > however, I suspect the lion's share of the slowdown is due to the
> > deltification and compression steps.
> > Cheers,
> > Daniel
> > P.S. This being users@, clarification: "svndiff0" and "svndiff1" are
> > internal binary delta formats that have nothing whatsoever to do with
> > the «svn diff» command.
> I think Mark was referring to this thread on dev@ from 2017, which was
> started by Paul Hammant (I think he was working on a tool for
> versioning big directory trees easily, with merkle trees etc ... might
> be interesting to get in touch with him):
> Philip Martin made some interesting suggestions and provided some
> numbers, first focusing on the deltification overhead (which he could
> eliminate, on the client-side, by enabling SVNAutoversioning and
> performing a PUT with curl -- IIUC it's not possible right now to
> eliminate deltification on the server-side):
> and later he also eliminated compression on the server-side, which
> yielded another factor 3 speed boost:
The threads I was trying to find were much older, but these are just as
good (and yes the same topic) so thanks for sharing them.
While hand-waving on all of the details, it seems like if we had some
property one could put on a file that combined all of these concepts we
could have a really compelling solution for handling large binaries. I am
not sure Karl's use case but I know video game companies have raised the
issue in the past, and I know some of our customers deal with things like
large binary files for embedded chip designs etc.
But the final goal should be something like this (in order of importance):
1. Do not store a pristine in working copy for the file
2. Do not do deltification on the client when committing
3. Do not do compression on server when storing
4. Do not do deltification on server
Received on 2020-04-25 20:01:59 CEST