[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PATCH] Proof of concept of the better-pristines (LZ4 + storing small pristines as BLOBs) (Was: Re: svn commit: r1843076)

From: Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
Date: Mon, 22 Oct 2018 23:14:43 +0300

Branko Čibej <brane_at_apache.org> writes:

> Still missing is a mechanism for the libsvn_wc (and possibly
> libsvn_client) to determine the capabilities of the working copy at
> runtime (this will be needed for deciding whether to use compressed
> pristines).

FWIW, I tried the idea of using LZ4 to compress the pristines and storing small
pristines as blobs in the `PRISTINE` table. I was particularly interested in
how such change would affect the performance and what kind of obstacles
would have to be dealt with.

In the attachment you will find a more or less functional implementation of
this idea that might be useful to some extent. The patch is a proof of
concept: it doesn't include the WC compatibility bits and most certainly
doesn't have everything necessary in place. But in the meanwhile, I think
that is might give a good approximation of what can be expected from the
approach.

The patch applies to the `better-pristines` branch.

A couple of observations:

 - As expected, the combined size of the pristines is halved when the data
   itself is compressible, thus making the working copy 25% smaller.

 - A variety of the callers currently access the pristine contents by reading
   the corresponding files. That doesn't work in case of compressed pristines
   or pristines stored as BLOBs.

   I think that ideally we would want to use streams as much as possible, and
   only spill the uncompressed pristine contents to temporary files when we
   need to pass them to external tools, etc.; and that temporary files need
   to be backed by a work queue to avoid leaving them in place in case of an
   application crash.

   The patch does that kind of plumbing to some extent, but that part of the
   work is not complete. The starting point is around wc_db_pristine.c:
   svn_wc__db_pristine_get_path().

 - Using BLOBs to store the pristine contents didn't have a measurable impact
   on the speed of the WC operations such as checkout in my experiments on
   Windows. These experiments were not comprehensive, and also I didn't run
   the tests on *nix.

 - There's also the deprecated svn_wc_get_pristine_copy_path() public API that
   would require plumbing to maintain compatibility; the patch performs it by
   spilling the pristine contents result into a temporary file whose lifetime
   is attached to the `result_pool`.

 (I probably won't be able to continue the work on this patch in the nearby
 future; posting this in case it might be useful.)

Thanks,
Evgeny Kotkov

Received on 2018-10-22 22:15:21 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.