[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH in progress] Ref-counting for pristine texts

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Tue, 11 Jan 2011 16:32:44 +0000

Branko Čibej wrote:
> On 11.01.2011 16:01, Julian Foad wrote:
> >> I see a different issue here: The close_wcroot() call is normally
> >> handled from pool cleanup for users of the svn_client api. (The
> >> svn_wc_context_t is cached in the client context, which is only closed
> >> on pool cleanup).
> > Thanks for pointing that out. That is not when I would like the
> > pristine cleanup to happen. I would like it to happen after every
> > operation that changes the WC - say after every major call into
> > libsvn_wc, and/or every major call into libsvn_client, or whenever the
> > wc.db work queue is run. Any thoughts on where would be best?
>
> What exactly are you trying to achieve? Is this a disk-space optimization?
> My hunch says that you do not want to do this too often at all because
> it'll turn out to be space-vs-time. Deleting a file isn't cheap even on
> a local filesystem these days. Better to relegate this to an explicit
> "svn cleanup"; or better yet, follow CMike's advice.

>From IRC:
[[[
<julianf> brane: got a minute re. pristine cleanup?

<brane> why, sure

<julianf> I want to achieve reasonable cache management at simplest
possible dev cost at the moment.
 With options for enhancing it later.

<brane> (nod)

<julianf> So my thought is "Let's just delete them as soon as we know
there might be some unref'd pristines."

<brane> good job on the automagic refcounting, btw

<julianf> Cheers.
 So I thought "It's got to be called from somewhere. Where? Upon closing
the ... uh ... WC admin-handle object? WC API? WC DB? Main application
pool? Dunno."
 But I really do think calling only from "svn cleanup" is not good
enough.
 So my current position is I want to invite help and suggestions on
where to do this for best effect while keeping the dev cost simple.
 Sure deleting a file isn't the cheapest op, but we only end up needing
to do it when we've just been doing some WC file shuffling anyway, so
it's not proportionally expensive either, is it?

<brane> well, i'm a bit worried about that on two counts: a) bugs -
deleting something too soon would not be nice; i know that's always a
"temporary" state of affairs, but still; b) time - i've seen deletes
taking more than a second on a local XFS

<julianf> In other words, my intent is by cleaning up very often, the
number of files deleted each time will be proportional to the size of
the update/copy/revert/delete/etc. WC operation that has just happened.

<brane> you have a point there
 when does the work queue get flushed? i'm guessing you can't have a
sqlite trigger run a callback written in C ...

<julianf> That might be possible; not sure.
 WC gets flushed within a libsvn_wc operation - e.g. several times
within an "update", possibly even several times per file.

<brane> sounds like deleting during WC flush could do the right thing
then

<julianf> As for "too soon" - yes, that's critical. That's partly why I
decided to implement ref-counting and then defer deletion to some later
time. Previously, I was assuming the higher-level parts of libsvn_wc
would delete it as soon as the code determined that it was no longer
needed by the current operation - but that gets tricky to analyze
because of passing the checksum around here and there to be used by some
other bit of code.

<julianf> So I'm looking for some place where we can say "By design, at
this place any references that are not stored in the DB are not valid
for looking up in the pristine store."

<julianf> I think WQ flush might be too low level, but not sure.
 brane: Has your speed concern been satisfied now, because it's only
proportional to work already being done?

<julianf> brane: Thanks for the chat.
]]]

I think the speed concern is a red herring, as it would only be a
problem if we were to try to delete a large number of them at some
inappropriate point in the code, such as when exiting a long-running
application.

The files have to be deleted at some time. If we do it little and
often, the time cost is only proportional to the size of the WC
operation being done, and the peak disk usage is low. On a system where
deletion takes one second per file, I would imagine all WC work would be
very slow anyway, but running "svn cleanup" after a long period of WC
work would be intolerably slow.

I don't see any advantage, except for ultimate implementation
simplicity, in batching up this particular kind of deletion. We
certainly haven't found a need to adopt that strategy in order to make
the deletion of other WC files (temp files or working files) perform
acceptably. And calling the cache management (currently "deletion")
function at a fine granularity won't get in the way of enhancements for
longer term caching and repository traffic optimizations based on using
the cache.

- Julian
Received on 2011-01-11 17:33:29 CET

This is an archived mail posted to the Subversion Dev mailing list.