[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Extending tsvncache to other source control systems

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: Thu, 01 May 2008 18:49:16 +0200

Mark Hammond wrote:

>> 1) fetching the status for a single file/folder is takes (almost) as
>> long and as many disk accesses as fetching the status for all items
>> inside one folder.
>> 2) that's why TSVNCache only fetches the status for whole folders, even
>> if the shell only asks for the status of one file (which it always
>> does).
>
> I think this is likely to be true for all VCS systems - especially when the
> common case is that all items in a folder end up being queried anyway.

Well, just guessing that this might be the case is not enough. I know
it's the case for Subversion.
So: to the guys who know the other VCS systems: is this true for those too?

>> 3) all folders are completely independent of each other. The cache
>> doesn't know what a "working copy" is, only whether a folder is
>> versioned or not. It does *not* know whether one folder belongs to the
>> same working copy as another.
>
> I'm afraid I don't quite understand what you mean here - but from what I can
> guess, I can't see any major differences between the various VCSs that make
> it a problem for us if it isn't a problem for you.

In Subversion, a working copy can consist of multiple folders from
*different* repositories. And you can even re-arrange the folders from
the same repository. That's called a 'nested' layout. That's also why
every Subversion folder has its own .svn folder in it.

>> 4) the edge cases you mentioned above are mostly Subversion specific
>> (how Subversion fetches the status, how it priorities the status, ...)
>
> Perhaps edge-cases was a bad term - in particular, the following things
> appear to be well implemented and optimized but are not related to SVN at
> all:
>
> * The watching and crawling process, and subtleties related to coalescing
> watchers for common ancestors.
> * Handling the SE_BACKUP_NAME privilege
> * Device removal notifications and their interactions with the watchers and
> crawlers
> * Misc optimizations like "blocking" the watcher threads for 4 seconds after
> a change we know about, etc.
> * Smart shell notify generation.

Yes, those are not SCM specific. But there's no 'clean' interface
between those functions and the svn specific part. So having a common
binary for that would be very difficult and a lot of work to implement.

> Indeed, close examination of the TSVNCache sources shows very few source
> files have much in the way of SVN specific code - CachedDirectory.cpp has
> some, FolderCrawler.cpp, ShellUpdater.cpp, SVNAdminDir.cpp, TSVNPath.cpp etc
> all have fairly minor dependencies, and the rest has (practically) none more
> than opaque passing around of data. FYI, I've included my notes of the
> source files below, in case there is something I have missed.
>
> So while I remain confident the technical challenges aren't that great, the
> pertinent question is whether you (ie, the tsvn developers) see enough long
> term advantage to your project by going down this track to justify the
> costs. The advantage would come from a potentially wider pool of people
> providing improvements, while the majority of the costs will probably be
> borne in the shorter term - but I accept they are significant. A perfectly
> reasonable option (and one that means the least short-term work for all of
> us :) is obviously to create our own fork, and I would understand perfectly
> if that is your guidance.

Having a common binary which provides the base functionality of watching
the filesystem, asking an SCM specific dll for the status data and then
handle the communication with the shell extension could work. But it
would be a *lot* of work, with IMHO little gain.

If however you want to just have a common code library for this, I'd
have to reject that idea. Because even if we could get something fairly
stable, every change made for one SCM because of a bug found could
horribly break the other systems. Because I doubt that any one of us has
more than one or maybe two of those Tortoise clients in use. So testing
a change for all systems won't be done but only for the one the dev is
using and working for. Heck, I even broke the x64 stuff in TSVN more
than once by changing some code - that's because I don't always build
and test the x64 build but only the win32 part when I'm working on TSVN.

> CachedDirectory:
>
> * Code that actually asks SVN for the status of items is here, but fairly
> well isolated.
> * svn externals get some special handling.
> * It has a number of concepts related to "most important" which should be
> easy to abstract.
> * It has state specific code when recursing into folders to get a "combined"
> state, but that appears largely generic. If you ignore this usage of item
> states, the only other states actually used by the rest of the code are
> svn_wc_status_none, svn_wc_status_unversioned, svn_wc_status_normal, and
> svn_wc_status_ignored. In other words, no other code in TSVNCache knows
> there is a status of "added", let alone the more esoteric SVN statuses.

And that class is deeply embedded into the rest of the TSVNCache
project. So separating it would be a challenge.

> CacheInterface.cpp
>
> * Nothing SVN specific
> * .h file defines TSVNCacheResponse, which is defined in terms of simple SVN
>
> Specific structures (but behavior doesn't depend on them)

Not much code in that one. But the struct is very SVN specific, and also
the data passed back to the shell extension. So I doubt that this one
could be shared.

> DirectoryWatcher.cpp
>
> * Generic win32 file watcher. Clever coalescing of parent/child sub-folders
> to optimize use of watch handles.

FYI: that optimization for use of watch handles is only available in
TSVN 1.5 - the 1.4 versions use a different approach.

> * Allows for a path to be "blocked" for a few seconds as a perf tweak
> * Handles device removal notifications.
> * Nothing SVN specific

Not quite: it checks whether a folder is versioned by checking whether
an .svn folder is present or not (not really a reliable check, but it's
good enough). Without that check, the whole optimization for using less
watcher handles won't work.

> FolderCrawler.cpp
>
> * Manages a "queue" of folders to be crawled.
> * Manages worker thread, including optimizations to prevent crawling while
> explorer is still asking for items
> * Sane handling of multiple requests for the same dir and/or children
> * Small SVN dependencies:
> - IsAdminDir() and a few tweaks for ignoring notifications that come due
> to admin files changing
> - isUnversioned(), etc
> - get new status for item, if status != old_status, update the shell -
> but
> the "status" is still opaque.

I won't disagree on this one :)

> ShellCache:
>
> * Helpers for the registry and global options, information about disks, etc
> * Not SVN specific

Well, it's TSVN specific. Unless you also want to share the settings
between the Tortoise clients of course. But I would definitely object to
such an approach - each Tortoise client must use its own settings.

> ShellUpdater.cpp
>
> * Helper thread for calling SHChangeNotify() when necessary.
> * Trivial SVN dependencies: HasAdminDir()/IsAdminDir()

The SVN dependencies might *seem* trivial, but they're essential! The
other version control systems would have to provide the same functions -
otherwise you would end up with repeated notifications which can render
the desktop/explorer completely unusable.

> StatusCacheEntry:
>
> * Holds all the status data of one file or folder.
> * Slightly conflates cache metadata (ie, m_discardAtTime and cache related
> ops) with SVN state - should be simple to split out status.
> * Loads and saves entries from a file and prepares for transfer over a
> "wire", but behavior isn't determined by SVN specific data.

description ok.

> SVNStatusCache:
>
> * A cache of CStatusCacheEntry objects, keyed by a TSVNPath
> * Manages the file watchers, so cached information for paths is
> updated/invalidated as they happen when possible.
> * The only behavior which depends on SVN information is limited to knowing
> if something is "unversioned" - but the behavior for such files is generic

description ok.

> SVNAdminDir:
>
> * Fairly generic - obvious SVN deps are
> - calls svn_wc_is_adm_dir()
> - hard-codes '.svn' as a special directory name
> - optionally hard-codes '_svn' as an alternate special directory name.

Since SVNAdminDir is a very small class, I can't see why you would think
that it's 'fairly generic' - the three SVN deps you mention below that
remark basically make up the whole class.

> TSVNCache.cpp:
>
> * the entry point and "toplevel" code
> * Handling of "device removal" notifications (stops watchers, drops cached
> info, etc)
> * Implementation of the "tray icon" for the cache.
> * Deals almost exclusively with CSVNStatusCache() objects - which is a
> fairly
> generic mapping of file paths to opaque "status" objects. Thus,
> TSVNCache.cpp is largely VCS independent.

description ok.

> SVNHelpers:
>
> * Some utilities specific to SVN that should be easy to hide.

Not sure about the 'easy to hide', but otherwise true.

> TSVNPath:
>
> * Generic file-system path object, not many svn deps. Ripe for a base class
> -
> only svn specific things are:
> const char* CTSVNPath::GetSVNApiPath(apr_pool_t *pool) const
> bool CTSVNPath::IsUrl() const
> apr_array_header_t * CTSVNPathList::MakePathArray (apr_pool_t *pool)
> const

description ok.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Received on 2008-05-01 17:49:43 CEST

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.