[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Need fast ways to get Info once WC-NG is introduced

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: Mon, 02 Aug 2010 21:52:12 +0200

On 02.08.2010 12:32, Bert Huijben wrote:

> I don't think there is a specific per folder check like this, but retrieving
> specific data about just one node (instead of its folder) will be *much*
> faster than in the old entries store. With the entries files we had to read
> the entire file in all cases, but a real database doesn't have that
> limitation.
>
> For all metadata except for pristine files we only have to open one file and
> sqlite just seeks to the right locations to fetch the data using its
> indexes.
>
> For AnkhSVN I'm thinking about splitting the status cache in two layers,
> instead of doing a 'svn status' per folder like we do in 1.6. (I think
> TortoiseSVN might do the same thing, but maybe it calls status with depth
> infinity)

Yes, TSVN does the same: one 'svn st' per folder with depth immediate.

> Getting information from the working copy per individual file will be so
> much cheaper than before, that I will look for metadata changes first (and
> cache only a fraction of the informational details I used to cache before)
> and only when I really need to, I will perform the pristine file comparison.
> (I don't know yet if I will use svn_(client|wc)_status for this or by just
> calling svn_wc_text_modified_p2() myself).
>
> I would imagine that TortoiseSVN's folder glyph status would be calculated
> much faster by using a similar strategy: First check if there is a metadata
> change or conflict somewhere in the tree (keeping track of translated
> filesize + filedate as these will be useful in the next step).
> (This would be +- svn_client_infoX(). This should also inform you of any
> property changes (I don't know if it already does that; but the information
> in our internal API's is there now))
> If there is such a status: just set the right glyph (early out; no need to
> check any pristine files)

So basically use svn_client_info() instead of svn_client_status(), then
only check the status for files that don't have a defined status yet
from that info. That seems like a good idea - a lot of work to rewrite
the existing code, but it should be worth it.

> And only if there isn't a status perform the svn_wc_text_modified_p2() calls
> where needed.

Would this API get renamed to svn_client_*? Or should I risk calling an
svn_wc_ API? It's still not clear whether the svn_wc_ APIs will get made
private as was discussed before.

> Your disk cache (via its hook) knows which on-disk files changed since the
> last scan, so it can handle this much smarter than the simple algorithm in
> svn_(client|wc)_status, which is mostly optimized for running in a cold
> cache situation.
>
> Instead of just one timestamp to compare to, you have more information: the
> current on disk-time and the information that a file just changed. And only
> if the file was modified in the last run, or when it's time is different
> than the stored and your previous on-disk time you have to perform the
> check.
>
>
> I think this would require some redesign on your current cache strategy (It
> certainly does for AnkhSVN), but the fact that you can now perform status
> updates per file instead of per directory by itself should open room for
> performance improvement. (I hope to solve some worse scenarios in AnkhSVN on
> directories containing a lot of files with this)

I'll start with the design soon. This will take quite a while until it
works properly...

>> Something else I use quite a lot in TSVN and especially the cache is a
>> quick check whether a folder is versioned or not, simply by checking
>> whether an .svn folder exists or not. Again here I only need to know
>> whether it's *maybe* versioned. If there's no .svn folder, I *know* it's
>> not versioned but if there is, I call the svn APIs and would get an
>> error in return if e.g. the .svn folder is empty or corrupted.
>> But with the single db design, there won't be .svn folders anymore
>> except for the root of the wc?
>> So is there an (almost as) fast way to check whether a folder is
>> versioned or not?
>
> I think the fastest way in the current code would be to call
> svn_wc_read_kind() on the directory, maybe after first checking that there
> is some .svn in at least one of the parent directories.

I thought about implementing a small cache for that, so that I don't
have to walk up the tree every time to find an .svn dir.
But I thought I read something about such a small cache getting
implemented in the svn library itself so I wanted to ask first - maybe
there's already an API to use that cache. Or maybe I just remember it wrong.

>
> The effect on single-db would be: open sqlite file (if not cached) and query
> two rows by using its primary key, via an index.
> (I think that function currently does the same queries twice; but that is on
> my TODO list).
>
>
> Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
> defined in wc.h yet? (This enables the experimental single-db mode)
>
> It should give some impression on what you can expect with single-db. (I
> think the current status is about 40 testfailures (9 in the upgrade tests),
> but it almost reduces the testsuite time by 50% compared to multi-db)

I don't like to build the TSVN nightlies with such experimental features
yet. Once the features get into trunk without compile switches, I will
of course start using them. But as long as they're not activated, I
think I'll stay away from those. Not just because they might be too
unstable, but mostly because that means the APIs still change a lot and
that's just too much work for me to adjust TSVN every time. There's
enough work to be done in TSVN itself :)

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net
Received on 2010-08-02 21:52:53 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.