[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [TSVN] RFC: New cache scheme

From: SteveKing <stefankueng_at_gmail.com>
Date: 2005-01-21 09:44:06 CET

On Thu, 20 Jan 2005 21:21:54 +0000, Will Dean <svn@indcomp.co.uk> wrote:

> Anyway, among all its charm, the current shell extension does have some
> problems:
> 1. In order to prevent the cache becoming stale (and, I think, because of
> historical concern about the amount of memory it might consume), cached
> items have very short lifetimes (a few seconds). In certain cases (large
> directories, slow filesystems), this can cause a pathological cache
> thrashing, where the time taken to build the cache exceeds the lifetime of
> its members. This is disastrous, as the very time you need the cache most
> is when it's slow to build. There is plenty of sticking plaster stuck on
> this particular wound, but it's not very pretty.

That won't happen. The cache is built in one step, and only _after_ it
is built the timer for the lifetime is started.

> 2. Unless you're in recursive status mode, the cache only holds the status
> for one folder.
> 3. The SVN libraries are statically linked and big and slow-to-start. The
> shell extension has to include them in order to get item status. Every
> process which starts a file-open dialog (not exactly a lightweight activity
> at the best of times) has to suffer SVN starting-up and loading
> into-process the first time the dialog is opened.

That time is almost irrelevant. You see, an application which uses
such a file-open dialog has to load the whole shell dll into its
process space. Loading the TSVN dll doesn't make much of a difference
With my old laptop (not even a month ago!) it took about 5 seconds for
a save-as dialog to show up when I didn't have TSVN installed! With
TSVN installed it took 5 seconds too. So what are those few
milliseconds more it takes to load TSVN to the 5 seconds it takes to
load the shell?

> 4. Because the shell extension is an in-process COM object (shell
> extensions are supposed to be in-process, this isn't a mistake), there is
> one cache per process. With the current very short cache lifetimes, this
> doesn't really make any difference to anybody, but it could be a
> missed-opportunity in terms of re-use of cached items. (For example, I
> think it's reasonably probable that you'll have Explorer windows and app
> file-open boxes pointing into similar folders.)
> 5. Shell extensions are a pig to debug.

True, but it's doable.

> I have been working on a completely different way of doing things, which
> shows some promise. It goes as follows:
> 1. Create a new application 'TSVNCache', which can run in the background,
> with a simple IPC interface which allows other processes to request the SVN
> status of a path. There's no U/I on this application.
> 2. Rip all the SVN status stuff out of the shell extension and replace it
> with something which asks TSVNCache for the status of a path. The shell
> extension knows nothing about SVN except for the arrangement of a
> svn_wc_status_t structure (which is what it's given by TSVNCache). The
> cache knows nothing about the shell extension or why it wants the status of
> the file, it just returns the status. To take this step to the limit, the
> property-page handler would probably need to come out into a separate DLL,
> because it's always going to need SVN.
> At this point, we've probably slowed things down slightly, because there's
> now an inter-process call (on a named pipe) between the shell extension and
> the cache. However, the cache is now a nice little stand-alone process,
> which one can start and stop at will, play around with and debug
> easily. (If you stop TSVNCache, the shell extension just marks things as
> unversioned, connecting to the cache again when it restarts.)

That sounds good. I remember we had this kind of suggestion on this
list before version 1.0 came out. Some of the problems (and reasons
why I didn't do it) still are valid, some of them aren't anymore.

Do you intend to make the cache a windows service? Now that we dropped
TSVN for Win98 that would now be possible. If so, we have to be _very_
carefull about priviledge escalation scenarious.

Named pipes are slow. And since the shell asks the status for each
file individually (for every overlay!) we have to implement at least a
little cache in the shell extension. Otherwise the performance will go
down a lot due to the IPC calls.

> So, the next step is to improve the cache:
> 3. Separate the caching of files and folders, so that you can build a big
> cache without needing to search a huge list of unstructured file names.
> 4. Increase the cache-lifetime (let's say that it's infinite)

Let's talk math here:
Assuming an average path length of 150 chars and 50'000 files under
version control:
That's 14,3 MBytes of memory. The status struct is at least twice as
big, maybe even more since it has filename, UUID, URL strings in it.
That's a total of 43 MBytes.
And if you then consider that there are people with even more files
under version control (the most I heard of until now was in the range
of 150'000) that memory will be even higher!

Maybe we can assume that people with that many files under version
control also have enough RAM? I think that's reasonable...

> 5. Keep track of the modification time of files which are cached, and the
> modification time of the .svn\entries file, and use these as hints to
> invalidate the cache. Note that these hints are agnostic about the client
> you use, so you can use the SVN CL and the cache will still be invalidated
> properly.

Hmmm - keeping even the .svn\entries files in the cache? Memory use
increases even more...

> .... This is about where I've got to at the moment ....
> I don't currently implement recursive folder status, but my idea for this
> is to do something along the following lines:
> 1. Fetch the minimum required status information synchronously, as at the
> moment.
> 2. As a lazy, background task, recurse downwards from each folder which is
> cached, calculating the dominant SVN status for each folder.
> 3. Issue shell-update requests for folders as their recursive status
> becomes known.

Good idea. But that's the one thing I never found a solution for and
also the main reason I haven't implemented such an approach yet:
If a user wan't to show the overlays recursively and the explorer
shows a green checkmark, they will assume that nothing below has
changed and that the folder doesn't need a commit. So you have a
timeframe where the overlay isn't 'correct' in the sense that it
doesn't really show the recursive status. How would the user know if
it's the recursive status or just the status of the folder itself?
That's very dangerous, because sometimes you see a completely _wrong_

> Because the cache is now so durable, usable recursive status becomes a real
> possibility, which I don't feel it is at the moment (it's more of a
> tantalising peek at how good it could be).
> So, what do people think about all this? I'm particularly interested in
> people's views on the legitimacy of my cache invalidation strategy, but I'd
> welcome any input.
> (Just for interest, I started by trying to implement something based on
> change notifications, which would have meant I could then have the cache
> generate all the shell-update notifications, but I don't think this is very
> scaleable.)

Change notifications with FindFirstChangeNotification are _very_ slow
and would hog the whole system a lot. Using ReadDirectoryChanges
however might be fast enough so the system wouldn't be slowed down a
lot. And now that we dropped Win98 we actually are able to use that


  oo  // \\      "De Chelonian Mobile"
 (_,\/ \_/ \     TortoiseSVN
   \ \_/_\_/>    The coolest Interface to (Sub)Version Control
   /_/   \_\     http://tortoisesvn.tigris.org
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Fri Jan 21 09:45:16 2005

This is an archived mail posted to the TortoiseSVN Dev mailing list.