Re: File modification detection?

From: Ph. Marek <philipp.marek_at_bmlv.gv.at>
Date: 2004-04-23 07:49:03 CEST

> > > But then Karl pointed out that while "on average" our current algorithm
> > > bails after reading half the text-base, this is cancelled out by the
> > > fact that it's reading two files instead of one. So maybe the
> > > byte-for-byte and checksum strategies come out even? :-)
> >
> > It's important to remember svn:keywords and svn:eol-style when
> > discussing working files and text bases. When we do a byte-for-byte
> > comparison with those keywords set then first the entire working file
> > is read, it gets "detranslated" and a new, temporary, file in
> > repository format is written. It's that temporary file that takes
> > part in the byte-for-byte comparison, and it's quite possible that the
> > write is the main performance hit.
> >
> > A possible optimisation would be to store a second md5sum for the
> > working file, and then do an md5sum comparison by reading the working
> > file and thus avoid the detranslate/write altogether.
>
> Just a note: Even without a second md5sum, you can avoid at least the
> write. In theory, you can do the detranslation and md5sum in chunks in
> memory without ever writing anything to disk. Like, having the
> detranslation going into a (non-disk) stream and doing the md5sum over
> that stream and then letting that stream go to /dev/null or such. That
> was the theory. Don't know how difficult it is to implement that in
> Subversion with the existing infrastructure.
I'd like to register a wish, which is so small that it doesn't need to be in
the issue tracker :-) ?

Currently svn tests the timestamp, and possibly the size.
IMHO it would be better to test the size first - if that differs it is
guaranteed (modulo keyword expansion, which could be checked via
svn:keywords) that the files differ.

Alternatively, the md5sum of *parts* of the file could be stored (eg every 2MB
worth), which *could* speed up verifying.

> > A possible problem is that the user may have edited an expanded
> > keyword causing the md5sum comparison to indicate a modification,
> > whereas the detranslate would drop that edit and indicate no
> > modification.
>
> IMHO, that's a separate issue. The first is how to recognize changes
> reliably (and fast), the second is how to force submits even when
> there is no actual change (with regard of what would be commited).
>
> To me, changing a generated keyword part feels the same as changing
> the timestamp. It indicates that there was some kind of change (and
> that a check of the content may be due), but that may have been
> effectively an no-op.
>
> Whether we want to support a "no-change" commit and how it would be
> triggered (a flag, some keyword change as suggested above, etc.) is a
> different question and shouldn't influence how we want to detect real
> changes. (If we get it cheaply, fine, but it shouldn't limit us.)
How about reversing that?
Changing the value of an expanded keyword doesn't change the file in the
repository, but leads to a commit?
No. I'd possibly like to commit a binary file, which doesn't have keywords ...

Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Apr 23 07:49:25 2004

This message: [ Message body ]
Next message: Branko Čibej: "Re: client side l10n: remaining issues?"
Previous message: Greg Hudson: "RFC: MD5sums in svn_txdelta interface"
In reply to: Benjamin Pflugmann: "Re: File modification detection?"
Next in thread: Andy Whitcroft: "Re: File modification detection?"
Reply: Andy Whitcroft: "Re: File modification detection?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]