[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Proof of concept of the better-pristines (LZ4 + storing small pristines as BLOBs) (Was: Re: svn commit: r1843076)

From: Branko Čibej <brane_at_apache.org>
Date: Tue, 23 Oct 2018 18:12:53 +0200

On 22.10.2018 22:14, Evgeny Kotkov wrote:
> Branko Čibej <brane_at_apache.org> writes:
>
>> Still missing is a mechanism for the libsvn_wc (and possibly
>> libsvn_client) to determine the capabilities of the working copy at
>> runtime (this will be needed for deciding whether to use compressed
>> pristines).
> FWIW, I tried the idea of using LZ4 to compress the pristines and storing small
> pristines as blobs in the `PRISTINE` table. I was particularly interested in
> how such change would affect the performance and what kind of obstacles
> would have to be dealt with.

Nice! I did some simpler tests by compressing exported trees, but this
is definitely better.

> In the attachment you will find a more or less functional implementation of
> this idea that might be useful to some extent. The patch is a proof of
> concept: it doesn't include the WC compatibility bits and most certainly
> doesn't have everything necessary in place. But in the meanwhile, I think
> that is might give a good approximation of what can be expected from the
> approach.
>
> The patch applies to the `better-pristines` branch.
>
> A couple of observations:
>
> - As expected, the combined size of the pristines is halved when the data
> itself is compressible, thus making the working copy 25% smaller.

Yes, that was my observation as well. In fact, though, storing small
BLOBs in the database itself should have even better effects, since the
space on disk actually used by a file is rounded up to the nearest
cluster size, but SQLite's blocks are typically much smaller than that.

> - A variety of the callers currently access the pristine contents by reading
> the corresponding files. That doesn't work in case of compressed pristines
> or pristines stored as BLOBs.
>
> I think that ideally we would want to use streams as much as possible, and
> only spill the uncompressed pristine contents to temporary files when we
> need to pass them to external tools, etc.; and that temporary files need
> to be backed by a work queue to avoid leaving them in place in case of an
> application crash.

Yes and yes. Keeping those temporary spilled files on disk could turn
out to be a problem, finding a reasonable time to delete them without
having to run cleanup will be rather important, I think.

> The patch does that kind of plumbing to some extent, but that part of the
> work is not complete. The starting point is around wc_db_pristine.c:
> svn_wc__db_pristine_get_path().
>
> - Using BLOBs to store the pristine contents didn't have a measurable impact
> on the speed of the WC operations such as checkout in my experiments on
> Windows. These experiments were not comprehensive, and also I didn't run
> the tests on *nix.

I wouldn't expect much change in performance but would expect better use
of the disk, as explained above.

> - There's also the deprecated svn_wc_get_pristine_copy_path() public API that
> would require plumbing to maintain compatibility; the patch performs it by
> spilling the pristine contents result into a temporary file whose lifetime
> is attached to the `result_pool`.

Ack; that's one reasonable definition of "lifetime." But I suspect that
any users of that function expect the pristine file to survive at least
to the next WC cleanup.

> (I probably won't be able to continue the work on this patch in the nearby
> future; posting this in case it might be useful.)

Thanks, it definitely is useful!

-- Brane
Received on 2018-10-23 18:13:03 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.