RE: Re: Caching text-size in the entries file

From: <SebastianUnger_at_eaton.com>
Date: 2006-11-07 23:52:22 CET

> -----Original Message-----
> From: Erik Huelsmann [mailto:ehuels@gmail.com]
> Sent: Wednesday, 8 November 2006 11:44
> To: Michael Haggerty
> Cc: Peter Lundblad; SVN Dev
> Subject: Re: Caching text-size in the entries file
>
>
> On 11/7/06, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> > Erik Huelsmann wrote:
> > > On 11/6/06, Erik Huelsmann <ehuels@gmail.com> wrote:
> > >> On 11/6/06, Peter Lundblad <plundblad@google.com> wrote:
> > >> > Erik Huelsmann writes:
> > >> > > Ah, we already have a speed-vs-correctness tradeoff
> in status: we use
> > >> > > mtimes instead of filecomparison or hash
> calculation. BTW: I do feel
> > >> > Yeah, but I think we need to be careful when making
> the heuristic
> > >> worse.
> > >>
> > >> Right, but what I hope to do is reduce the number of
> false negatives:
> > >> I assert most files which are modified (but do not have their
> > >> timestamp changed) won't have modified keywords or eols
> only. Rather:
> > >> I think having modified eols-and-keyword-expansions-only
> is an edge
> > >> case rather unlikely to happen. Whereas we have seen cases where
> > >> timestamps were kept constant (making the changes undetected).
> > >>
> > >> Currently our algorithm doesn't know any false
> positives, but it has a
> > >> chance for false negatives. I'd rather see that the
> other way around:
> > >> false positives are correctable with an 'svn revert' or
> 'svn cleanup';
> > >> false negatives don't have a cure.
> >
> > What if Erik's new algorithm is used to detect
> > files-that-might-be-modified, then those files are
> double-checked using
> > the more expensive algorithm? I assume that in most use
> cases, at most
> > a small percentage of files are changed when 'svn stat' is run.
> > Therefore this should give almost as large a speed win without any
> > downsides.
>
> Well, that'll give us fewer false negatives, without the extra false
> positives, but it will gain us no speedup: all files which are marked
> 'maybe-changed' need to be detranslated to test eol- and keywords-only
> changes. My point is that these are sufficiently edge case not to
> require full detranslation on status: 'normally' only content changes
> will have occurred.
I don't know, but the way I understand it, only very few files would normally
be de-translated. Therefor you would gain some speed. Also, aren't there
actually three algorithms: Full text compare (after detranslation), mtimes and
text-size? There ought to be a way to combine these which results in fast and
nearly (or fully?) error-free detection of changed files.
Received on Wed Nov 8 01:22:22 2006

This message: [ Message body ]
Next message: Alexander Kitaev: "Re: Writing java hooks, can I use javahl?"
Previous message: Max Bowsher: "Subversion 1.4.2 released."
Next in thread: Erik Huelsmann: "Re: Re: Caching text-size in the entries file"
Reply: Erik Huelsmann: "Re: Re: Caching text-size in the entries file"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]