On Thu 2004-04-22 at 21:12:28 +0100, Philip Martin wrote:
> Ben Collins-Sussman <sussman@collab.net> writes:
[...]
> > But then Karl pointed out that while "on average" our current algorithm
> > bails after reading half the text-base, this is cancelled out by the
> > fact that it's reading two files instead of one. So maybe the
> > byte-for-byte and checksum strategies come out even? :-)
>
> It's important to remember svn:keywords and svn:eol-style when
> discussing working files and text bases. When we do a byte-for-byte
> comparison with those keywords set then first the entire working file
> is read, it gets "detranslated" and a new, temporary, file in
> repository format is written. It's that temporary file that takes
> part in the byte-for-byte comparison, and it's quite possible that the
> write is the main performance hit.
>
> A possible optimisation would be to store a second md5sum for the
> working file, and then do an md5sum comparison by reading the working
> file and thus avoid the detranslate/write altogether.
Just a note: Even without a second md5sum, you can avoid at least the
write. In theory, you can do the detranslation and md5sum in chunks in
memory without ever writing anything to disk. Like, having the
detranslation going into a (non-disk) stream and doing the md5sum over
that stream and then letting that stream go to /dev/null or such. That
was the theory. Don't know how difficult it is to implement that in
Subversion with the existing infrastructure.
> A possible problem is that the user may have edited an expanded
> keyword causing the md5sum comparison to indicate a modification,
> whereas the detranslate would drop that edit and indicate no
> modification.
IMHO, that's a separate issue. The first is how to recognize changes
reliably (and fast), the second is how to force submits even when
there is no actual change (with regard of what would be commited).
To me, changing a generated keyword part feels the same as changing
the timestamp. It indicates that there was some kind of change (and
that a check of the content may be due), but that may have been
effectively an no-op.
Whether we want to support a "no-change" commit and how it would be
triggered (a flag, some keyword change as suggested above, etc.) is a
different question and shouldn't influence how we want to detect real
changes. (If we get it cheaply, fine, but it shouldn't limit us.)
Bye,
Benjamin.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Apr 22 22:47:03 2004