[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Need fast ways to get Info once WC-NG is introduced

From: Bert Huijben <bert_at_qqmail.nl>
Date: Mon, 2 Aug 2010 12:32:41 +0200

> -----Original Message-----
> From: Stefan Küng [mailto:tortoisesvn_at_gmail.com]
> Sent: zaterdag 31 juli 2010 17:31
> To: Subversion Development
> Subject: Need fast ways to get Info once WC-NG is introduced
>
> Hi,
>
> I think I best first describe what I do in TSVN now:
> TSVN has a cache of all working copy statuses which is used by the shell
> extension to show the icon overlays. It would be way too slow to fetch
> the status every time the shell requests the overlays, so that's why we
> have that cache.
>
> The cache itself tries to do as little as possible while still keeping
> the status of each item up to date. It gets notified by the OS whenever
> a file is changed and decides then whether to re-fetch the status with
> the SVN API or not. But even calling the status API in those cases is
> too expensive and leads to way too heavy disk access. So the cache does
> a very quick check first: it reads the file time of the entries and
> props file inside the .svn folder - only if that time has changed it
> calls the svn status API. If it hasn't changed and there was no change
> notification for a file inside that folder, calling the API isn't
necessary.
>
> To clarify this a little bit, imagine the cache gets a change
> notification for all 'entries' files in a wc because someone did a
> commit or an update.
> The problem is that the cache gets such notifications even if the file
> content hasn't changed, it's enough if a file was opened with write
> access - the notification is sent even if there was no actual write to
> the file.
> So by checking the file dates of the entries/props files the cache
> determines whether a call to the svn API is needed or not for the
> subfolders.
>
> Now, as far as I understand it, with WC-NG and the single db design,
> there are no files in each wc folder anymore which indicate whether
> something affecting the status has changed. There will only be one
> single db file for all folders of a wc.
>
> So my first question is: is there a very quick way to find out whether
> something status related has changed since a specific time for a
> particular wc folder? I haven't found an API so far which I could use
> for this. It doesn't have to be reliable, i.e., all I need to know
> whether it *may* be that the status have changed, I don't really need to
> know whether it *really* has changed because once I get the 'maybe', I
> will call the status API and then get the definite answer.

I don't think there is a specific per folder check like this, but retrieving
specific data about just one node (instead of its folder) will be *much*
faster than in the old entries store. With the entries files we had to read
the entire file in all cases, but a real database doesn't have that
limitation.

For all metadata except for pristine files we only have to open one file and
sqlite just seeks to the right locations to fetch the data using its
indexes.

For AnkhSVN I'm thinking about splitting the status cache in two layers,
instead of doing a 'svn status' per folder like we do in 1.6. (I think
TortoiseSVN might do the same thing, but maybe it calls status with depth
infinity)

Getting information from the working copy per individual file will be so
much cheaper than before, that I will look for metadata changes first (and
cache only a fraction of the informational details I used to cache before)
and only when I really need to, I will perform the pristine file comparison.
(I don't know yet if I will use svn_(client|wc)_status for this or by just
calling svn_wc_text_modified_p2() myself).

I would imagine that TortoiseSVN's folder glyph status would be calculated
much faster by using a similar strategy: First check if there is a metadata
change or conflict somewhere in the tree (keeping track of translated
filesize + filedate as these will be useful in the next step).
(This would be +- svn_client_infoX(). This should also inform you of any
property changes (I don't know if it already does that; but the information
in our internal API's is there now))
If there is such a status: just set the right glyph (early out; no need to
check any pristine files)

And only if there isn't a status perform the svn_wc_text_modified_p2() calls
where needed.
Your disk cache (via its hook) knows which on-disk files changed since the
last scan, so it can handle this much smarter than the simple algorithm in
svn_(client|wc)_status, which is mostly optimized for running in a cold
cache situation.

Instead of just one timestamp to compare to, you have more information: the
current on disk-time and the information that a file just changed. And only
if the file was modified in the last run, or when it's time is different
than the stored and your previous on-disk time you have to perform the
check.

I think this would require some redesign on your current cache strategy (It
certainly does for AnkhSVN), but the fact that you can now perform status
updates per file instead of per directory by itself should open room for
performance improvement. (I hope to solve some worse scenarios in AnkhSVN on
directories containing a lot of files with this)

> Something else I use quite a lot in TSVN and especially the cache is a
> quick check whether a folder is versioned or not, simply by checking
> whether an .svn folder exists or not. Again here I only need to know
> whether it's *maybe* versioned. If there's no .svn folder, I *know* it's
> not versioned but if there is, I call the svn APIs and would get an
> error in return if e.g. the .svn folder is empty or corrupted.
> But with the single db design, there won't be .svn folders anymore
> except for the root of the wc?
> So is there an (almost as) fast way to check whether a folder is
> versioned or not?

I think the fastest way in the current code would be to call
svn_wc_read_kind() on the directory, maybe after first checking that there
is some .svn in at least one of the parent directories.

The effect on single-db would be: open sqlite file (if not cached) and query
two rows by using its primary key, via an index.
(I think that function currently does the same queries twice; but that is on
my TODO list).

Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
defined in wc.h yet? (This enables the experimental single-db mode)

It should give some impression on what you can expect with single-db. (I
think the current status is about 40 testfailures (9 in the upgrade tests),
but it almost reduces the testsuite time by 50% compared to multi-db)

        Bert
Received on 2010-08-02 12:33:25 CEST

This is an archived mail posted to the Subversion Dev mailing list.