[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2004-05-22 22:00:14 CEST

Josh Pieper <jjp@pobox.com> writes:

> Philip Martin wrote:
>> >> What I observe with the current code is typically
>> >>
>> >> A dir
>> >> A dir/file
>> >> A dir/subdir
>> >> A dir/subdir/file
>>
>> Interrupt here. Is dir/subdir versioned? If not then I don't think
>> cleanup will find subdir's log file. I suppose one might be able to
>> run cleanup repeatedly.
>
> If the interrupt is hard, i.e. kill -9, the log files will not have
> been moved into the live position yet. That could be a problem as
> there will now be unversioned obstructions lying around, but no data
> should be lost.
>
> If the log file for the inner directory has been made live, but not
> the one for the parent directory, cleanup may have a hard time finding
> it. If this is a big problem, we could run the parent's log file
> before recursing into subdirectories, and would still gain performance
> if there were many text files in a directory.

But how do we run the log file if it is in .svn/tmp/log?

At present each log file contains a set of several operations and, in
general, all the operations must be completed before the wc is
consistent. We cannot simply run arbitrary log files from .svn/tmp/
since such a log file may contain an incomplete set of operations, we
don't know where the interrupt occurred.

> If both log-files were made live before the interrupt, I believe
> cleanup would run the parent directory's logfile first, then use the
> new state of its entries to recurse and would thus correctly recurse
> into subdir.

Not at present, cleanup explicitly runs children first.

>
>> >> A dir/another_file
>> >
>> > Well, it does pass all the tests. :)
>>
>> We don't really have much in the way of cleanup regression tests.
>>
>> I'm not really sure how this "pending" log file will interact with
>> cleanup. At present the log file is written in .svn/tmp/log and moved
>> to .svn/log just before it is run. Moving to .svn/log is what makes
>> the log file "live", and then it is visible to cleanup. Where does
>> your log file accumulate? When does it become live? How does it work
>> when there are multiple log files being accumulated?
>
> The pending logs are kept in .svn/tmp/log the same as they are
> currently. When either
> a) a delete operation occurs
> b) the editor closes the directory or
> c) the cancellation function returns an error
> the log files are moved into the live position in a depth first
> fashion and run one at a time.

That's a step backwards for restartable checkouts. All the files
downloaded will be stored somewhere (in .svn/tmp/ perhaps?) until the
log file is run. If the log file is in .svn/tmp/log it will not get
run by cleanup, so there will be no way to promote the downloaded
files into versioned items. Restarting a checkout will need to
download those files again :-(

Remember that the log file itself is not the performance bottleneck,
(every log file operation gets written once) it's the entries file
that is the problem. You are combining log files to solve the entries
file problem, you must be careful not to break the log file atomic
guarantees in the process.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat May 22 22:00:32 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.