Re: status information

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: Thu, 17 Jun 2010 13:10:47 +0200

On Thu, Jun 17, 2010 at 11:49, Bert Huijben <bert_at_qqmail.nl> wrote:
>
>
>> -----Original Message-----
>> From: Stefan Küng [mailto:tortoisesvn_at_gmail.com]
>> Sent: woensdag 16 juni 2010 22:14
>> To: Bert Huijben; Subversion Development
>> Subject: Re: status information
>>
>> On 16.06.2010 21:40, Bert Huijben wrote:
>>
>> > The plan is to remove even more expensive members from
>> svn_wc_status3_t, to
>> > make it just return cheap information. (I think working_size is
>> available,
>> > so we can add that specific one; but the scan if the file has been
>> modified
>> > on disk will be removed).
>>
>> Huh? The scan whether a file was modified gets removed? Then what is
>> the
>> status call for if not to get the text and property status?
>> If you really want an API which returns less info faster, please create
>> a new API for it (or a flag in the svn_client_status() call to omit the
>> more expensive checks). But seriously: whether a file is modified *is*
>> the status of a file. If you remove that info, it's not a status API
>> anymore but a 'svn_client_fastWcInfo()' or something like this.
>>
>> I think having svn_client_status() not return that information would
>> lead to much confusion about the term 'status'.
>
> The problem here is that svn_client_statusX() still uses svn_wc_status3_t.
> The idea is to make this function use a new svn_client_status_t structure
> which is different from svn_wc_status3_t in that it contains more 'expensive
> data'.

Ok, no problem with that.
What I have a problem with is that you said you wanted to remove the
modified info. But the modified info *is* the status. And the 'status'
API must return that status. Otherwise it's not a status API but
something else.
So please, no matter what you name that new structure: it must provide
the status info. If not, use another API.

> svn_wc_*status() will just give you the cheap data and information on which
> expensive data you can retrieve yourself. The problem with the current
> svn_wc_status infrastructure is that it always does all the work and in some
> cases then even throws away the result. (E.g. when you find a conflict the
> text modifications are already calculated, but not used for the status
> result).
>
> Most simple clients can just use svn_client_status5() and trust the results
> to be complete like they were used to, but more advanced clients can use
> svn_wc_walk_status() to get higher performance.
>
> E.g. If TortoiseSVN or AnkhSVN switched to the wc apis, they can just show
> the conflicted glyph when it sees the conflict on a 2 GB file, instead of
> also comparing that file against its base version. And if you run status
> just to show a glyph for a modification on a subdirectory it is not
> necessary to perform comparation on the rest of the files, when you found a
> single change. (But you would want the conflict results)

Sure, that's one way of dealing with this.
But it requires that every svn client has to implement the same code
itself, instead of having it in the svn lib.

Why not just add a mask param to the svn status API which specifies
which info the client requires and have the svn lib fetch all the
required info in *one single* wc crawl? It can't get faster than that!
And all svn clients can be assured that they all get the *same* info
back. If it's not in the svn lib, clients might implement things
differently. For example, if clients have to compare WC/BASE
themselves to find out whether a file is modified, that only leads to
big troubles.

And due to the status callback, I can 'stop' a status crawl anytime I
like even with 1.6.x clients. That's nothing new.

>> The svn trunk now is a *lot* slower accessing the wc info than 1.6.x,
>> and even if that gets better (hopefully soon), forcing me to use three
>> API calls instead of one means an even worse slowdown. Even if trunk
>> was
>> as fast as 1.6.x, with those API changes it will take three times the
>> file accesses and crawls to get the same info as before.
>
> Are we on a single database?

No.

> Future performance is only speculation until we get there. For data layout

And that's what I'm worried about most. Nobody knows what the
performance will be when everything's finished. What if the
performance isn't much better than it is now? Even if the performance
gets to a point where it's only half as fast as 1.6, I wouldn't use
it.

> we are still where we were last September, but Greg says he is going to
> switch to in-db properties 'really soon now'.

Yes, I've seen the mails.

> The on-disk format is not stable and I wouldn't recommend using trunk in the
> current state unless you know what you are doing. (E.g. the locking system
> is completely broken)

I have to use it. If you really plan to have a one/two month grace
period before the 1.7 release (whenever that might be), that's not
enough time for me to catch up with TSVN. Also if I don't start using
it now, I won't have any chance to get some things changed if
necessary.

>> For example, to get the same info as with a status call before, I now
>> have to first get the status, then fetch the svn:needs-lock property on
>> every file separately. You can guess that this is a performance hit I
>> simply can not accept. That means I lose an important feature in TSVN,
>> something I'm absolutely sure many users will get very angry about.
>
> Properties will be much faster when we switch to the in-db properties. The
> current code writes all property data to files and the database, but only
> reads from the files.

Even if reading props is a zero-operation: it still requires me to
walk the whole wc again to find all files/folders to read the
properties from.

>> Also keep in mind that locking is one of the features of SVN that the
>> other (distributed) version control systems don't have, so that's one
>> of
>> the important things users choose svn over those.
>> Loosing that information (the svn:needs-lock info) or getting a severe
>> performance hit fetching it another way is just very bad. SVN should
>> make the features it stands out from the others make work well, not
>> worse. Or it will lose one of the most important advantages it has.
>
> I can't say what the performance hit is on retrieving in-db properties, but
> I assume that it will be much faster than opening a file just to read one
> property.

But still: it requires some work to be done twice. If I'm not
completely mistaken, to find the property status of an item, you
already have to read at least some of the properties already. So you
have the db connection open, even to the items properties and you're
already reading (at least some of) them.

Stefan

-- 
       ___
  oo  // \\      "De Chelonian Mobile"
 (_,\/ \_/ \     TortoiseSVN
   \ \_/_\_/>    The coolest Interface to (Sub)Version Control
   /_/   \_\     http://tortoisesvn.net

Received on 2010-06-17 13:11:48 CEST

This message: [ Message body ]
Next message: Hyrum K. Wright: "Re: javahl compatibility question"
Previous message: Bert Huijben: "RE: status information"
In reply to: Bert Huijben: "RE: status information"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]