[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: WC modification detection is reading whole files

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2006-02-17 14:19:38 CET

kfogel@collab.net wrote:
> Julian Foad <julianfoad@btopenworld.com> writes:
>
>>>>Shouldn't it notice that the file size and date are not as expected
>>>>and return a "modified" status immediately?
>>>
>>>Do we record the unmodified working file size for translated files in
>>>.svn/entries? If so, then we could detect modified-ness this way.
>>
>>We don't currently store any size information there. However, I'm now
>>thinking it's not so easy. (It would have been done if it were.)
>>Even if you modify the file by changing some keyword value or EOL
>>style, as long as it translates back to the same pristine text we must
>>report it as unchanged. Therefore the size doesn't tell us anything.
>
> Not so fast...
>
> A size check can tell you if something is definitely modified. It's
> only if the size comes back the same as what you recorded before that
> you have to do further investigation. The algorithm is:
>
> if (timestamp is same)
> return not_modified;
> else if (size differs)
> return modified;
> else
> go_into_expensive_further_investigation();

This is indeed the algorithm that we use, and it's fine for determining whether
a file without keyword or EOL translations matches its text base.

It doesn't work for determining whether a translated working file (that is, one
in which keyword expansions or EOL conversions are in effect) still matches its
pristine copy. If such a working file's time stamp and size have both changed
since the last time we looked at it, then certainly it's been changed, but we
don't know whether it will match its text base. It might, so we have to read
it and see.

That's the case I meant by "the size doesn't tell us anything", I didn't mean
in all cases.

David James wrote:
> So if the timestamp is the same, but the size differs, we return "not
> modified"? That seems strange to me.

Yes, but it's quite reasonable in practice. For one thing, it is only in
unusual situations that a file is modified but its time stamp is kept the same.
  For another thing, if a file is modified but its time stamp is not, its size
doesn't necessarily change, and we don't intend to detect that case, so also
not detecting cases when the size has changed is only worse in magnitude, not
in concept.

That said, if we can read both time stamp and size together as Karl suggests,
then we certainly should do better by checking them both before making any
decision.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Feb 17 14:20:16 2006

This is an archived mail posted to the Subversion Dev mailing list.