[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: bug in svn diff and related?

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2005-03-16 09:12:27 CET

On Tue, 15 Mar 2005, Travis wrote:

>
> On Mar 15, 2005, at 9:46 PM, Ben Collins-Sussman wrote:
>
> > On Mar 15, 2005, at 3:23 PM, Travis P wrote:
> >
> >> On Mar 15, 2005, at 8:31 AM, Ben Collins-Sussman wrote:
> >>
> >>> The algorithm is extremely similar to what CVS does:
> >>>

First: stat working file and compare timestamp to the timestamp stored in
the entries file.

> >>> stat working and textbase files.
> >>> if (mtimes of working and textbase are equal):
> >>> return NOT_CHANGED;
> >>> else if (filesizes of working and textbase are unequal):
> >>> return CHANGED;
> >>> else
> >>> compare the files byte-by-byte. /* very slow */
> ...
> >> if (filesizes of working and textbase are unequal):
> >> return CHANGED;
> >> else if (mtimes of working and textbase are equal):
> >> return NOT_CHANGED;
> >> else
> >> compare the files byte-by-byte. /* very slow */
> >>
> >
...
> > And also, I'd argue this is slower. 99% of the time, almost every
> > file in the tree will be unchanged, and will have identical
> > working/textbase timestamps. Using the current algorithm, it means
> > that 99% of the time we get a definitive "answer" to the question on
> > the first comparison. In your algorithm, we'd end up doing two
> > comparsions nearly all the time, instead of one.
>
...
> True: I am considering the cost of the extra integer comparison and
> branch to be negligible. If you are decreasing the accuracy of the
> heuristic because of that cost, it seems like an unworthwhile
> micro-optimization to me. Do you really think the cost of the
> comparison and branch are not negligible on the order of tens or maybe
> hundreds of thousands of files in a wc?
>
Of course,it is negible. But the problem is the first step in the
algorithm. It doesn't need to stat the base file in a common case, since
it has the timestamp in .svn/entries. If this was a real (and
reasonably common) problem, we
could ofcourse store the base filesize in the entries file as well.

Just to clarify,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Mar 16 09:11:21 2005

This is an archived mail posted to the Subversion Users mailing list.