[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Proof of concept of the better-pristines (LZ4 + storing small pristines as BLOBs) (Was: Re: svn commit: r1843076)

From: Bert Huijben <bert_at_qqmail.nl>
Date: Mon, 29 Oct 2018 14:27:16 +0100

On Windows' NTFS implementation very small files (probably something like <
256 bytes, but this is not documented/strictly stable) are stored in the
directory table and so don't use 'a whole cluster'.

Nice work on all the research!

    Bert

On Tue, Oct 23, 2018 at 6:12 PM, Branko Čibej <brane_at_apache.org> wrote:

> On 22.10.2018 22:14, Evgeny Kotkov wrote:
> > Branko Čibej <brane_at_apache.org> writes:
> >
> >> Still missing is a mechanism for the libsvn_wc (and possibly
> >> libsvn_client) to determine the capabilities of the working copy at
> >> runtime (this will be needed for deciding whether to use compressed
> >> pristines).
> > FWIW, I tried the idea of using LZ4 to compress the pristines and
> storing small
> > pristines as blobs in the `PRISTINE` table. I was particularly
> interested in
> > how such change would affect the performance and what kind of obstacles
> > would have to be dealt with.
>
> Nice! I did some simpler tests by compressing exported trees, but this
> is definitely better.
>
> > In the attachment you will find a more or less functional implementation
> of
> > this idea that might be useful to some extent. The patch is a proof of
> > concept: it doesn't include the WC compatibility bits and most certainly
> > doesn't have everything necessary in place. But in the meanwhile, I
> think
> > that is might give a good approximation of what can be expected from the
> > approach.
> >
> > The patch applies to the `better-pristines` branch.
> >
> > A couple of observations:
> >
> > - As expected, the combined size of the pristines is halved when the
> data
> > itself is compressible, thus making the working copy 25% smaller.
>
> Yes, that was my observation as well. In fact, though, storing small
> BLOBs in the database itself should have even better effects, since the
> space on disk actually used by a file is rounded up to the nearest
> cluster size, but SQLite's blocks are typically much smaller than that.
>
>
> > - A variety of the callers currently access the pristine contents by
> reading
> > the corresponding files. That doesn't work in case of compressed
> pristines
> > or pristines stored as BLOBs.
> >
> > I think that ideally we would want to use streams as much as
> possible, and
> > only spill the uncompressed pristine contents to temporary files when
> we
> > need to pass them to external tools, etc.; and that temporary files
> need
> > to be backed by a work queue to avoid leaving them in place in case
> of an
> > application crash.
>
> Yes and yes. Keeping those temporary spilled files on disk could turn
> out to be a problem, finding a reasonable time to delete them without
> having to run cleanup will be rather important, I think.
>
>
> > The patch does that kind of plumbing to some extent, but that part of
> the
> > work is not complete. The starting point is around wc_db_pristine.c:
> > svn_wc__db_pristine_get_path().
> >
> > - Using BLOBs to store the pristine contents didn't have a measurable
> impact
> > on the speed of the WC operations such as checkout in my experiments
> on
> > Windows. These experiments were not comprehensive, and also I didn't
> run
> > the tests on *nix.
>
> I wouldn't expect much change in performance but would expect better use
> of the disk, as explained above.
>
> > - There's also the deprecated svn_wc_get_pristine_copy_path() public
> API that
> > would require plumbing to maintain compatibility; the patch performs
> it by
> > spilling the pristine contents result into a temporary file whose
> lifetime
> > is attached to the `result_pool`.
>
> Ack; that's one reasonable definition of "lifetime." But I suspect that
> any users of that function expect the pristine file to survive at least
> to the next WC cleanup.
>
> > (I probably won't be able to continue the work on this patch in the
> nearby
> > future; posting this in case it might be useful.)
>
> Thanks, it definitely is useful!
>
> -- Brane
>
>
Received on 2018-10-29 14:27:29 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.