[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: permanent solution for deltification problem (issue #1601)

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2003-12-03 02:54:20 CET

On Tue, 2003-12-02 at 18:05, kfogel@collab.net wrote:
> I do think deltification has to be asynchronous, because the user delay
> is too noticeable. It doesn't matter that deltification is
> technically O(N), when the constant factor is so high.

... and Branko wrote:
> Yes, but the difference in the constant factor between synchronous and
> asynchronous deltification can be enormous, and at 30 seconds vs. 10
> minutes, the "it's allways O(n)" argument won't make anyone happier.

I've been committing to the Subversion repository for a good long time
now, and I have noticed nothing like an eleven-fold delay at the end of
a commit. Have either of you guys collected any data?

Regardless, I continue to strongly object to this means of solving the
problem, if there really is a problem. There are some rules which we
simply should not break. A command operating against the filesystem
should not terminate before it is done poking at the filesystem. It
would not be okay for autoconf to write out cache data in the background
after exiting; it would not be okay for slocate's updatedb to exit and
then attempt to compress the database it just built. It violates the
command-line contract. It's not an okay way out.

If mod_dav_svn wants to tell the client the commit has succeeded before
doing deltification, that's fine. It's a server process, and it's
allowed to do processing that isn't part of a client request-response
cycle. svnserve in daemon mode could do the same. No forking
required. (If the vagaries of Apache demand forking to make that
happen, then fine, but the fork belongs in mod_dav_svn then, not in
libsvn_fs.)

> > * In some environments, user processes may be killed after logout.
> > * In some environments, user processes may lose access to some
> > filesystems after logout.
> > * People may write scripts which mount a filesystem, perform svn
> > operations on a repository within the filesystem, and then
> > unmount the filesystem.
>
> These environments seem quite rare.

Since when is it okay for us to cryptically fail in environments which
differ a little bit from the common case? But if you insist, let me
paint a slightly more familiar scenario:

I'm working on my laptop, and I run a commit against a repository on
local disk. Since my laptop is running a bit low on battery, I turn it
off as soon as the commit succeeds. As a result, my commit is only
partially deltified, which means I don't get the space performance
Subversion is supposed to give me. (And maybe my DB needs recovery; not
sure about that.)

> Or am I wrong about how rare these are? The only way to estimate the
> danger here is to know how often these situations occur in real life.
> After all, in some environments NULL != 0, but we don't lie awake at
> night worrying about it.

As you're aware, I think it's dumb that we write non-conforming C code
because we're too lazy to explicitly initialize structure fields. But
this is a red herring anyway. No one can point to a machine which could
conceivably ever run Subversion where NULL is not all-bits-0 at runtime
(abbreviating this to "NULL != 0" is deceptive, since
constant-0-casted-to-pointer-at-compile-time is always always always the
null pointer).

> And, Mike just pointed out (verbally) that doing deltification
> per-file during the commit is not a good option. If the commit is
> aborted, then all of those deltifications have to be undone, because
> the things they're deltifying *against* will not exist after the txn
> is removed.

You prepare the deltified representations on a per-file basis and slide
them into place at the end.

> > I do not object to an option to make deltification asynchronous, as long
> > as it is not turned on by default. But I don't think it's really a
> > solution; it's more of a workaround.
>
> Well, I have no compelling proof one way or the other; the difference
> between a solution and a workaround is a matter of taste. All I can
> say is that I think this falls into the category of a "bug" with a
> "solution" available.

If we have a performance problem, the solution is to make our
performance better. Hiding the problem by making a chunk of the
operation take place in the background is a cheesy workaround, not a
solution.

Branko wrote:
> However, if we decided to remove unused log files by defailt, then we
> should also turn on asynchronous deltification by default, because
> that's what the vast majority of environments will want to use. Making
> things configurable, even per access method, is fine, as far as I'm
> concerned. Using different criteria to choose the default behaviour
> is not.

Here is the criteria for choosing defaults: they need to be robust, and
they need to provide acceptable performance. Subversion is not
qualitatively more robust if we keep around DB logfiles (as I argued
previously, if Subversion horks its DB, the same level of wizardry is
required to fix the DB as it is to use the logfiles to roll it back),
but it does not provide acceptable performance if logfiles are kept--DB
size grows to infinity even if you just do repeated read-only
operations. So the default must be not to keep logfiles.

I understand that you both think asynchronous deltification is necessary
for acceptable performance (although I am not convinced), but it is not
robust, so it cannot be the default.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Dec 3 02:55:07 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.