[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Check SHA vs Content (was: RE: svn commit: r1759233 - /subversion/trunk/subversion/libsvn_wc/questions.c)

From: Markus Schaber <m.schaber_at_codesys.com>
Date: Mon, 5 Sep 2016 13:27:48 +0000

Hi,

From: Bert Huijben [mailto:bert_at_qqmail.nl]
> From: ivan_at_apache.org [mailto:ivan_at_apache.org]
> > Sent: maandag 5 september 2016 13:33
> > To: commits_at_subversion.apache.org
> > Subject: svn commit: r1759233 -
> > /subversion/trunk/subversion/libsvn_wc/questions.c
> >
> > Author: ivan
> > Date: Mon Sep 5 11:32:54 2016
> > New Revision: 1759233
> >
> > URL: http://svn.apache.org/viewvc?rev=1759233&view=rev
> > Log:
> > Use SHA-1 checksum to find whether files are actually modified in
> > working copy if timestamps don't match.
> >
> > Before this change we were doing this:
> > 1. Compare file timestamps: if they match, assume that files didn't change.
> > 2. Open pristine file.
> > 3. Read properties from wc.db and find whether translation is required.
> > 4. Compare filesize with pristine filesize for files that do not
> > require translation. Assume that file is modified if the sizes differ.
> > 5. Compare detranslated contents of working file with pristine.
> >
> > Now behavior is the following:
> > 1. Compare file timestamps: if they match, assume that files didn't change.
> > 3. Read properties from wc.db and find whether translation is required.
> > 3. Compare filesize with pristine filesize for files that do not
> > require translation. Assume that file is modified if the sizes differ.
> > 4. Calculate SHA-1 checksum of detranslated contents of working file
> > and compare it with pristine's checksum stored in wc.db.
>
> We looked at this before, and this change has pro-s and con-s, depending on
> specific use cases.
>
> With the compare to SHA we only have to read the new file, but we always have
> to read the file 100%.
>
> With the older system we could bail on the first detected change.
>
> If there is a change somewhere both systems read on average 100% of the
> filesize... only if there is no actual change except for the timestamp, the
> new system is less expensive.>
>
> If the file happens to be a database file or something similar there is quite
> commonly a change in the first 'block', when there are changes somewhere
> later on. (Checksum, change counter, etc.). File formats like sqlite were
> explicitly designed for this (and other cheap checks), with a change counter
> at the start.

Maybe we could cache a checksum of the first block (4k) of each file in the wc.db, to profit from those changes. Thus, we only need to read the whole file if the first block checksum is equal.

> I don't think we should 'just change behavior' here, if we don't have actual
> usage numbers for our users. Perhaps we should make this feature
> configurable... or depending on filesize.
>
> We certainly want the new behavior for non-pristine working copies (on the
> IDEA list for years), but I'm not sure if we always want this behavior as
> only option.
>
> This mail is partially, to just discuss this topic on the list, to make sure
> everybody knows what happened here and why.
>
>
>
> Bert
>
> (Note that it is labor day in the USA today... so I don't expect many
> responses until later this week)

Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.schaber@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
Received on 2016-09-05 15:28:07 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.