[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: status information

From: Bert Huijben <bert_at_qqmail.nl>
Date: Thu, 17 Jun 2010 11:49:05 +0200

> -----Original Message-----
> From: Stefan Kng [mailto:tortoisesvn_at_gmail.com]
> Sent: woensdag 16 juni 2010 22:14
> To: Bert Huijben; Subversion Development
> Subject: Re: status information
> On 16.06.2010 21:40, Bert Huijben wrote:
> > The plan is to remove even more expensive members from
> svn_wc_status3_t, to
> > make it just return cheap information. (I think working_size is
> available,
> > so we can add that specific one; but the scan if the file has been
> modified
> > on disk will be removed).
> Huh? The scan whether a file was modified gets removed? Then what is
> the
> status call for if not to get the text and property status?
> If you really want an API which returns less info faster, please create
> a new API for it (or a flag in the svn_client_status() call to omit the
> more expensive checks). But seriously: whether a file is modified *is*
> the status of a file. If you remove that info, it's not a status API
> anymore but a 'svn_client_fastWcInfo()' or something like this.
> I think having svn_client_status() not return that information would
> lead to much confusion about the term 'status'.

The problem here is that svn_client_statusX() still uses svn_wc_status3_t.
The idea is to make this function use a new svn_client_status_t structure
which is different from svn_wc_status3_t in that it contains more 'expensive

svn_wc_*status() will just give you the cheap data and information on which
expensive data you can retrieve yourself. The problem with the current
svn_wc_status infrastructure is that it always does all the work and in some
cases then even throws away the result. (E.g. when you find a conflict the
text modifications are already calculated, but not used for the status

Most simple clients can just use svn_client_status5() and trust the results
to be complete like they were used to, but more advanced clients can use
svn_wc_walk_status() to get higher performance.

E.g. If TortoiseSVN or AnkhSVN switched to the wc apis, they can just show
the conflicted glyph when it sees the conflict on a 2 GB file, instead of
also comparing that file against its base version. And if you run status
just to show a glyph for a modification on a subdirectory it is not
necessary to perform comparation on the rest of the files, when you found a
single change. (But you would want the conflict results)

> > The plan is to introduce a new svn_client_status_t structure (for
> > svn_client_status) which will have more data then the
> svn_wc_status3_t
> > structure, but we haven't started on that one.

Update: I started on this now, just to document the plans.

> >
> > The assumption here is that users of the libsvn_wc api want direct
> access to
> > the working copy without any performance penalties they might not
> want.
> > (E.g. not scanning the file for changes, or parsing all the conflict
> > details)
> I get that. But I suggest adding a separate API for this and not
> cripple
> the svn_client_status() for this. Status info means getting the info
> about which files are modified. That's what every svn user expects when
> talking about the status. So the API has to honor that.
> If you want a way to get less info faster, please create an API with
> another name for this.

As written above: svn_client_status5() will work just like
svn_client_statusX() in that it does all the checks. It will just pass a
svn_client_status_t instead of a svn_wc_status3_t.

> > On the other side: users of the libsvn_client api wants to have
> access to
> > the direct usable combination of data, which might give less
> performance.
> Yes, something I need to in some situations. But I'd like to have this
> information returned in one call if possible. Sure, I can now get the
> info I need in two or three svn API calls. But that means that to get
> the info, the wc is crawled three times instead of just once. And every
> time files are touched which is very expensive on Windows.
> The svn trunk now is a *lot* slower accessing the wc info than 1.6.x,
> and even if that gets better (hopefully soon), forcing me to use three
> API calls instead of one means an even worse slowdown. Even if trunk
> was
> as fast as 1.6.x, with those API changes it will take three times the
> file accesses and crawls to get the same info as before.

Are we on a single database?

Future performance is only speculation until we get there. For data layout
we are still where we were last September, but Greg says he is going to
switch to in-db properties 'really soon now'.

The on-disk format is not stable and I wouldn't recommend using trunk in the
current state unless you know what you are doing. (E.g. the locking system
is completely broken)

> For example, to get the same info as with a status call before, I now
> have to first get the status, then fetch the svn:needs-lock property on
> every file separately. You can guess that this is a performance hit I
> simply can not accept. That means I lose an important feature in TSVN,
> something I'm absolutely sure many users will get very angry about.

Properties will be much faster when we switch to the in-db properties. The
current code writes all property data to files and the database, but only
reads from the files.

> Also keep in mind that locking is one of the features of SVN that the
> other (distributed) version control systems don't have, so that's one
> of
> the important things users choose svn over those.
> Loosing that information (the svn:needs-lock info) or getting a severe
> performance hit fetching it another way is just very bad. SVN should
> make the features it stands out from the others make work well, not
> worse. Or it will lose one of the most important advantages it has.

I can't say what the performance hit is on retrieving in-db properties, but
I assume that it will be much faster than opening a file just to read one


> What I'm trying to say here: please don't cripple existing APIs but
> create new ones with new names, and have APIs that return the same info
> as before without a performance hit.
> Stefan
> --
> ___
> oo // \\ "De Chelonian Mobile"
> (_,\/ \_/ \ TortoiseSVN
> \ \_/_\_/> The coolest Interface to (Sub)Version Control
> /_/ \_\ http://tortoisesvn.net
Received on 2010-06-17 11:49:58 CEST

This is an archived mail posted to the Subversion Dev mailing list.