[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [msysGit] Re: [Tortoisehg-discuss] Bazzar stratgy regarding shell extension

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: Mon, 21 Apr 2008 22:43:07 +0200

John Arbash Meinel wrote:

> The whole point of this system is that you are explicitly caching *in
> memory* what could be determined from the filesystem. You are also (if
> you are going to be nice to the OS) probably going to limit the amount
> of information you cache in memory. So that if you browse through 50,000
> directories, it may chose to only cache the last 10,000 directories
> worth of information.

We had some similar discussions on the TSVN mailing list before. I just
repeat now what I said back then:

You're assuming that limiting the cache could actually work. But just
think for once what you assume here: you assume that caching only the
last used information is enough.
But how can you know that? How can you know that not all information is
required? You can't know that the last 1000 entries in the cache are not
used anymore. If you e.g. set the limit to 10000 directories, how can
you know that there's not a working copy with 20000 directories? And if
you have such a big one, you will need *all* the information, not just
the last 10000 ones.

It's up to the user to decide that (he's the only one who can decide!).
And if he can see that he's got problems with memory because of the
cache process, he can either deactivate the cache or even use another
version control client which doesn't have a cache at all.

> Likely some amount of limits is fine, since most people have a limited
> amount of "active workspace" at any given moment.
>
> However, the "active workspace" could very easily jump between different
> version control systems. For example, I could work on stuff that is
> stored in SVN in the morning, and then switch over to stuff that is
> versioned in BZR in the afternoon.

Most people have all projects in one specific subfolder. And since
Tortoise clients are bound to the explorer, how can you decide which
projects are active and which ones not? One view in the explorer of that
project parent dir, and you need *all* the information from *all* projects.

> With multiple processes, the SVN cache won't know that I've switched
> over to using BZR, and thus will retain its cache, even though *I* no
> longer need it, because I've moved on to another location.

The *explorer* does not know that you switched over. So the cache won't
know either.

> At the end of the day, I would end up with 2 processes, each with full
> caches. Instead of a single full cache that had whatever I had been
> working on last.

And the next day, the user has to wait minutes until the cache has
switched over to what the user want's to work on.

> There certainly are other ways around it.
>
> 1) Let the OS decide when you need to hit swap, and page out the cache
> to disk. I would argue that if the TSVN caching process needs to be
> swapped to disk, it should instead just discard the data. After all, it
> is just going to go back to disk to recompute the information, which
> probably isn't a whole lot slower than paging in its cache files.

Discarding the data? Then what's the point of having a cache?

> 2) Have a timeout on all cached information. This, however, would
> require having your process periodically scan through its cache and
> decide what to prune. Probably not hard to write, but does require
> scanning through the cache from time to time. Which may be intensive (or
> not, all depends on implementation).

A timeout is needed anyway. Otherwise you would not be able to catch
modifications which are made by hidden processes (imagine a working copy
on a network share, and a user from another computer modifies a file).

> On the other hand, a simple LRU cache can prune out the nodes which have
> not been accessed in a while when new information is requested. This
> also has the advantage that if I leave my machine on overnight, the last
> accessed stuff from the night before is still in the cache, even though
> it might have exceeded the time threshold. (You would have to set a
> relatively short threshold if you wanted to be friendly to other caching
> processes.)

Since there's no safe way to know which information is needed and which
can be cleared, there can be no LRU at all.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Received on 2008-04-22 07:43:24 CEST

This is an archived mail posted to the TortoiseSVN Dev mailing list.