> -----Original Message-----
> From: Erik Huelsmann [mailto:firstname.lastname@example.org]
> Sent: Wednesday, 8 November 2006 11:44
> To: Michael Haggerty
> Cc: Peter Lundblad; SVN Dev
> Subject: Re: Caching text-size in the entries file
> On 11/7/06, Michael Haggerty <email@example.com> wrote:
> > Erik Huelsmann wrote:
> > > On 11/6/06, Erik Huelsmann <firstname.lastname@example.org> wrote:
> > >> On 11/6/06, Peter Lundblad <email@example.com> wrote:
> > >> > Erik Huelsmann writes:
> > >> > > Ah, we already have a speed-vs-correctness tradeoff
> in status: we use
> > >> > > mtimes instead of filecomparison or hash
> calculation. BTW: I do feel
> > >> > Yeah, but I think we need to be careful when making
> the heuristic
> > >> worse.
> > >>
> > >> Right, but what I hope to do is reduce the number of
> false negatives:
> > >> I assert most files which are modified (but do not have their
> > >> timestamp changed) won't have modified keywords or eols
> only. Rather:
> > >> I think having modified eols-and-keyword-expansions-only
> is an edge
> > >> case rather unlikely to happen. Whereas we have seen cases where
> > >> timestamps were kept constant (making the changes undetected).
> > >>
> > >> Currently our algorithm doesn't know any false
> positives, but it has a
> > >> chance for false negatives. I'd rather see that the
> other way around:
> > >> false positives are correctable with an 'svn revert' or
> 'svn cleanup';
> > >> false negatives don't have a cure.
> > What if Erik's new algorithm is used to detect
> > files-that-might-be-modified, then those files are
> double-checked using
> > the more expensive algorithm? I assume that in most use
> cases, at most
> > a small percentage of files are changed when 'svn stat' is run.
> > Therefore this should give almost as large a speed win without any
> > downsides.
> Well, that'll give us fewer false negatives, without the extra false
> positives, but it will gain us no speedup: all files which are marked
> 'maybe-changed' need to be detranslated to test eol- and keywords-only
> changes. My point is that these are sufficiently edge case not to
> require full detranslation on status: 'normally' only content changes
> will have occurred.
I don't know, but the way I understand it, only very few files would normally
be de-translated. Therefor you would gain some speed. Also, aren't there
actually three algorithms: Full text compare (after detranslation), mtimes and
text-size? There ought to be a way to combine these which results in fast and
nearly (or fully?) error-free detection of changed files.
Received on Wed Nov 8 01:22:22 2006