[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: issue #1573: fs deltification causes delays

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: 2003-11-04 16:01:25 CET

kfogel@collab.net writes:

> There are various proposed solutions in the issue. But for now, I'd
> like to talk just about solutions we can implement before 1.0 (i.e.,
> before Beta, i.e., before 0.33 :-) ). The two that seem most
> realistic are:
>
> 1. Prevent deltification on files over a certain size, but create
> some sort of out-of-band compression command -- something like
> 'svnadmin deltify/compress/whatever' that a sysadmin or cron job
> can run during non-peak hours to reclaim disk space.
>
> 2. Make svn_fs_merge() spawn a deltification thread (using APR
> threads) and return success immediately. If the thread fails to
> deltify, it's not the end of the world: we simply don't get the
> disk-space savings.

     3. Never do deltification of any sort in the filesystem code, and
        create an out-of-band compression command that can be run as a
        post-commit hook.

> (2) looks like a wonderful solution; the only thing I'm not sure of is
> how to do it inside an Apache module. Does anyone know?
>
> I assume that (1) would involve a repository config option for the
> file size. Note also that we used to have an 'svnadmin deltify'
> command and could easily get it back (see r3920), so (1) may not
> actually be as much work as it looks like. Those who don't want to
> run the cron job would just set the size limit to infinity, and always
> get deltification.

  (3) looks simple, involves no repository configuration option, and
      removes all the deltification overhead from the commit process
      itself. O(1) commits, finally.

You have my vote. Subversion chants the "disk is cheap" mantra all
over the place. If we really believe that, it won't hurt to stop
deltifying in-process and start doing it in the hooks, even adding the
exact command-line for running the 'svnadmin tunefs' (or whatever)
command necessary in the post-commit.tmpl template.

Some kind of post-commit cleanup is necessary anyway, because I have a
strong suspicion that what we save in in-database storage by
deltifying, we lose temporarily in out-of-database storage thanks to
the logfiles generated during the deltification process. Heh... I
just ran the fs-test binary as-as. 44.6 Megs of disk consumed by my
tests/libsvn_fs directory (and the test repos in it) now. So, tweaky
tweaky... turn off deltification in tree.c... compile... re-run
fs-test -- 44.1 Megs. Nice.

fs-test doesn't *nearly* represent normal usage, though some of the
tests do cover non-small binary files, and of course there are lots of
tests of really small (Greek) file mods. But, with no post-commit
processing at all, we see that, at least for that dataset, it is
cheaper in terms of disk usage *and* speed (no proof of this, but I
think that's a trustworthy proposition) to *not* deltify *anything*.
Of course, if we had a script to remove logfiles, our deltification
would surely have paid off (at least, space-wise).

I say all this to promote the idea that it isn't too much to ask of a
repos administrator to run some out-of-process deltification routine
-- even per-commit -- because if they are truly concerned about disk
space, they'll already have some out-of-process log-file cleanup
process. And if you have a cronjob/post-commit hook to cleanup
logfiles, what's an extra line in that script to deltify?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Nov 4 16:03:23 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.