[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Streamy FS writes found detrimental.

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-02-26 23:30:41 CET

On Tue, Feb 26, 2002 at 04:27:09PM -0500, Daniel Berlin wrote:
>...
> I thought you were complaining of the performance of streamy fs writes.
> If you are complaining about log file size, that's easy to remedy.
>
> BTW guys, I looked at libxdfs (xdelta's bdb based FS library), and it
> does what i was doing (log file removal in a thread).

I don't believe we want to do it in a thread because of some of the issues
that I raised it earlier. The admin needs to have control over it because
this is part of their backup policy. The (very valid) point of having the
default toss them is good, and we should do that. Admins that want them for
their backups can disable the auto-delete.

Auto-deletion of the log files can occur in one of two ways:

1) synchronously, say, after every commit of a txn
2) async through the post-commit hooks

Theoretically, a zillion 'svn update' requests will also grow the log files,
so a cron job is actually quite ideal. But that increases the difficulty of
installing subversion.

Personally, I like option (2). If we went with option (1), then we would
need some way to configure the particular repos to disable the cleaning.
That implies new directives and/or config files somewhere. I'd prefer to
avoid that hassle.

> Hey guys, are we replacing items when we don't need to? (IE rewriting
> data with the same data) If so, this will greatly increase the log file
> size, since replaces are logged and include both the original, and
> replacement, data.

Yes, we are. Consider what happens during the streamy operation. We're
appending data. That modifies the value over and over.

It might be interesting to look into using duplicate keys for the 'strings'
table. That might allow us to append the new data, without needing to
rewrite the whole darned record.

> I get the feeling we rewrite stuff for no reason at times, given the log
> file sizes.

"no reason" is a bit of an extreme statement. Mistakes are made, sure, but
things are always done for some reason or another. ("no reason" is a
relatively inflammatory statement; it puts the original coder on the
defensive for their reasoning behind the code)

But the point is valid: if the log sizes are so large, then it could
certainly point to modifying a record too many times.

We also modify records when we deltify them. During transaction
construction, we also "bubble up" directory entry lists. When we convert a
rep from mutable to non-mutable. etc.

I think it might be interesting to investigate the duplicate keys. See:

    http://www.sleepycat.com/docs/ref/am_conf/dup.html

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:10 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.