[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Extending tsvncache to other source control systems

From: Mark Hammond <mhammond_at_skippinet.com.au>
Date: Fri, 2 May 2008 00:25:58 +1000

Hi Stefan,
  Thanks for the response. I hope I can clarify things.

> > Assuming it does, it strikes me that we need something very very
> > similar to TSVNCache - and TSVNCache already has a stable
> > implementation with many edge cases handled well. Indeed, relatively
> > little of the TSVNCache source code seems related to talking to
> > subversion, but is instead a fairly generic implementation. So I'm
> > wondering if you guys have any interest in, or thoughts about how we
> > could extend TSVNCache to cleanly separate out the VCS specific code
> > so it is more suitable for reuse? It might also prove useful for
> > Mercurial in the future, which finds itself in a similar position.
>
> Since I have no idea on how bazaar handles the status of files and
> folders, I can't really say whether you could use some part of the
> TSVNCache source. It is optimized to handle the Subversion situation of
> fetching the status:

> 1) fetching the status for a single file/folder is takes (almost) as
> long and as many disk accesses as fetching the status for all items
> inside one folder.
> 2) that's why TSVNCache only fetches the status for whole folders, even
> if the shell only asks for the status of one file (which it always
> does).

I think this is likely to be true for all VCS systems - especially when the
common case is that all items in a folder end up being queried anyway.

> 3) all folders are completely independent of each other. The cache
> doesn't know what a "working copy" is, only whether a folder is
> versioned or not. It does *not* know whether one folder belongs to the
> same working copy as another.

I'm afraid I don't quite understand what you mean here - but from what I can
guess, I can't see any major differences between the various VCSs that make
it a problem for us if it isn't a problem for you.
 
> 4) the edge cases you mentioned above are mostly Subversion specific
> (how Subversion fetches the status, how it priorities the status, ...)

Perhaps edge-cases was a bad term - in particular, the following things
appear to be well implemented and optimized but are not related to SVN at
all:

* The watching and crawling process, and subtleties related to coalescing
watchers for common ancestors.
* Handling the SE_BACKUP_NAME privilege
* Device removal notifications and their interactions with the watchers and
crawlers
* Misc optimizations like "blocking" the watcher threads for 4 seconds after
a change we know about, etc.
* Smart shell notify generation.

Indeed, close examination of the TSVNCache sources shows very few source
files have much in the way of SVN specific code - CachedDirectory.cpp has
some, FolderCrawler.cpp, ShellUpdater.cpp, SVNAdminDir.cpp, TSVNPath.cpp etc
all have fairly minor dependencies, and the rest has (practically) none more
than opaque passing around of data. FYI, I've included my notes of the
source files below, in case there is something I have missed.

So while I remain confident the technical challenges aren't that great, the
pertinent question is whether you (ie, the tsvn developers) see enough long
term advantage to your project by going down this track to justify the
costs. The advantage would come from a potentially wider pool of people
providing improvements, while the majority of the costs will probably be
borne in the shorter term - but I accept they are significant. A perfectly
reasonable option (and one that means the least short-term work for all of
us :) is obviously to create our own fork, and I would understand perfectly
if that is your guidance.

As mentioned, below are the notes I've made on the TSVNCache implementation.

Thanks for your help,

Mark

CachedDirectory:

* Code that actually asks SVN for the status of items is here, but fairly
  well isolated.
* svn externals get some special handling.
* It has a number of concepts related to "most important" which should be
  easy to abstract.
* It has state specific code when recursing into folders to get a "combined"
  state, but that appears largely generic. If you ignore this usage of item
  states, the only other states actually used by the rest of the code are
  svn_wc_status_none, svn_wc_status_unversioned, svn_wc_status_normal, and
  svn_wc_status_ignored. In other words, no other code in TSVNCache knows
  there is a status of "added", let alone the more esoteric SVN statuses.

CacheInterface.cpp

* Nothing SVN specific
* .h file defines TSVNCacheResponse, which is defined in terms of simple SVN

  Specific structures (but behavior doesn't depend on them)

DirectoryWatcher.cpp

* Generic win32 file watcher. Clever coalescing of parent/child sub-folders
  to optimize use of watch handles.
* Allows for a path to be "blocked" for a few seconds as a perf tweak
* Handles device removal notifications.
* Nothing SVN specific

FolderCrawler.cpp

* Manages a "queue" of folders to be crawled.
* Manages worker thread, including optimizations to prevent crawling while
  explorer is still asking for items
* Sane handling of multiple requests for the same dir and/or children
* Small SVN dependencies:
   - IsAdminDir() and a few tweaks for ignoring notifications that come due
     to admin files changing
   - isUnversioned(), etc
   - get new status for item, if status != old_status, update the shell -
but
     the "status" is still opaque.

ShellCache:

* Helpers for the registry and global options, information about disks, etc
* Not SVN specific

ShellUpdater.cpp

* Helper thread for calling SHChangeNotify() when necessary.
* Trivial SVN dependencies: HasAdminDir()/IsAdminDir()

StatusCacheEntry:

 * Holds all the status data of one file or folder.
 * Slightly conflates cache metadata (ie, m_discardAtTime and cache related
   ops) with SVN state - should be simple to split out status.
 * Loads and saves entries from a file and prepares for transfer over a
   "wire", but behavior isn't determined by SVN specific data.

SVNStatusCache:

* A cache of CStatusCacheEntry objects, keyed by a TSVNPath
* Manages the file watchers, so cached information for paths is
  updated/invalidated as they happen when possible.
* The only behavior which depends on SVN information is limited to knowing
  if something is "unversioned" - but the behavior for such files is generic

SVNAdminDir:

* Fairly generic - obvious SVN deps are
  - calls svn_wc_is_adm_dir()
  - hard-codes '.svn' as a special directory name
  - optionally hard-codes '_svn' as an alternate special directory name.

TSVNCache.cpp:

* the entry point and "toplevel" code
* Handling of "device removal" notifications (stops watchers, drops cached
  info, etc)
* Implementation of the "tray icon" for the cache.
* Deals almost exclusively with CSVNStatusCache() objects - which is a
fairly
  generic mapping of file paths to opaque "status" objects. Thus,
  TSVNCache.cpp is largely VCS independent.

SVNHelpers:

* Some utilities specific to SVN that should be easy to hide.

TSVNPath:

* Generic file-system path object, not many svn deps. Ripe for a base class
-
  only svn specific things are:
     const char* CTSVNPath::GetSVNApiPath(apr_pool_t *pool) const
     bool CTSVNPath::IsUrl() const
     apr_array_header_t * CTSVNPathList::MakePathArray (apr_pool_t *pool)
const

-- end --

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_tortoisesvn.tigris.org
For additional commands, e-mail: dev-help_at_tortoisesvn.tigris.org
Received on 2008-05-01 16:26:13 CEST

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.