[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: bug in svn diff and related?

From: Josef Wolf <jw_at_raven.inka.de>
Date: 2005-03-16 22:06:42 CET

On Tue, Mar 15, 2005 at 09:46:49PM -0600, Ben Collins-Sussman wrote:

[ ... ]
> >>The algorithm is extremely similar to what CVS does:
> >>
> >> stat working and textbase files.
> >> if (mtimes of working and textbase are equal):
> >> return NOT_CHANGED;
> >> else if (filesizes of working and textbase are unequal):
> >> return CHANGED;
> >> else
> >> compare the files byte-by-byte. /* very slow */
> >>
> >Ben,
> >
> >I'm curious why you don't use this alternative heuristic as less
> >likely to return a false answer and no slower (I'm making the common
> >assumption that getting mtime and size are both stat operations that
> >are fetched in one operation):
> >
> > if (filesizes of working and textbase are unequal):
> > return CHANGED;
> > else if (mtimes of working and textbase are equal):
> > return NOT_CHANGED;
> > else
> > compare the files byte-by-byte. /* very slow */
>
> Why is this less likely to return a false answer?

Hmmm, my irony-detector must be broken somehow. Assume different file
sizes but equal mtimes: First algorithm will return NOT_CHANGED while
the second returns CHANGED. Obviously, the first algorithm gives the
wrong answer. Given that this is the only case where the algorithms
disagree, the second algorithm _is_ less likely to return a false answer.

> And also, I'd argue this is slower. 99% of the time, almost every file
> in the tree will be unchanged, and will have identical working/textbase
> timestamps. Using the current algorithm, it means that 99% of the time
> we get a definitive "answer" to the question on the first comparison.

And this "answer" is less likely to be correct with the first algorithm
than with the second one.

> In your algorithm, we'd end up doing two comparsions nearly all the
> time, instead of one.

Clearly, here is a speed/correctness tradeoff. It would be very
interesting how big the difference is in reality.

-- 
No software patents!
-- Josef Wolf -- jw@raven.inka.de --
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Mar 16 22:12:57 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.