RE: Re: Re: Caching text-size in the entries file

From: <SebastianUnger_at_eaton.com>
Date: 2006-11-09 22:29:41 CET

> -----Original Message-----
> From: Erik Huelsmann [mailto:ehuels@gmail.com]
> Sent: Friday, 10 November 2006 10:17
> To: Unger, Sebastian
> Cc: dev@subversion.tigris.org
> Subject: Re: Re: Caching text-size in the entries file
>
>
> > > -----Original Message-----
> > > From: Erik Huelsmann [mailto:ehuels@gmail.com]
> > > Sent: Wednesday, 8 November 2006 11:44
> > > To: Michael Haggerty
> > > Cc: Peter Lundblad; SVN Dev
> > > Subject: Re: Caching text-size in the entries file
> > >
> > >
> > > On 11/7/06, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> > > > Erik Huelsmann wrote:
> > > > > On 11/6/06, Erik Huelsmann <ehuels@gmail.com> wrote:
> > > > >> On 11/6/06, Peter Lundblad <plundblad@google.com> wrote:
> > > > >> > Erik Huelsmann writes:
> > > > >> > > Ah, we already have a speed-vs-correctness tradeoff
> > > in status: we use
> > > > >> > > mtimes instead of filecomparison or hash
> > > calculation. BTW: I do feel
> > > > >> > Yeah, but I think we need to be careful when making
> > > the heuristic
> > > > >> worse.
> > > > >>
> > > > >> Right, but what I hope to do is reduce the number of
> > > false negatives:
> > > > >> I assert most files which are modified (but do not have their
> > > > >> timestamp changed) won't have modified keywords or eols
> > > only. Rather:
> > > > >> I think having modified eols-and-keyword-expansions-only
> > > is an edge
> > > > >> case rather unlikely to happen. Whereas we have seen
> cases where
> > > > >> timestamps were kept constant (making the changes
> undetected).
> > > > >>
> > > > >> Currently our algorithm doesn't know any false
> > > positives, but it has a
> > > > >> chance for false negatives. I'd rather see that the
> > > other way around:
> > > > >> false positives are correctable with an 'svn revert' or
> > > 'svn cleanup';
> > > > >> false negatives don't have a cure.
> > > >
> > > > What if Erik's new algorithm is used to detect
> > > > files-that-might-be-modified, then those files are
> > > double-checked using
> > > > the more expensive algorithm? I assume that in most use
> > > cases, at most
> > > > a small percentage of files are changed when 'svn stat' is run.
> > > > Therefore this should give almost as large a speed win
> without any
> > > > downsides.
> > >
> > > Well, that'll give us fewer false negatives, without the
> extra false
> > > positives, but it will gain us no speedup: all files
> which are marked
> > > 'maybe-changed' need to be detranslated to test eol- and
> keywords-only
> > > changes. My point is that these are sufficiently edge case not to
> > > require full detranslation on status: 'normally' only
> content changes
> > > will have occurred.
> > I don't know, but the way I understand it, only very few
> files would normally
> > be de-translated.
>
> Maybe, but maybe not: all files for which the mtime has changed
> currently will be detranslated. In working copies which exist for
> several months, that may be a much larger number than the files which
> actually contain changes.
>
> > Therefor you would gain some speed.
>
> When? What I'm proposing is that we introduce a large number of cases
> where detranslation *isn't* required to call a file modified (which
> currently *only* happens after detranslation).
>
> > Also, aren't there
> > actually three algorithms: Full text compare (after
> detranslation), mtimes and
> > text-size?
>
> No, there are 2: full file compare and text-size. The point is that
> full file compare (currently the only method) will only be used when
> the file looks changed by its mtime (ie changed mtime).
>
> > There ought to be a way to combine these which results in fast and
> > nearly (or fully?) error-free detection of changed files.
>
> Full error-free detection is only possible with full file compare on
> every file in the working copy. Having said that, the current
> algorithm says that if a file has a changed mtime it *might* be
> changed and needs a full compare.
>
> The new algorithm which Peter Lundblad proposes is to *also* require a
> full file compare if the file size changed. Instead of spending less
> time in status, we will now spend more time in status (and err less
> often on the side of false-negatives!).
>
> My proposal is that the full file compare on files which have a
> changed file size only serves to filter out those cases where people
> have edited keyword expansions or changed eols from CRLF to LF or vice
> versa: extreme edge cases not worth the extra cost on the normal use
> case.
>
> Hope that makes my reasoning (and the problem) clearer.
But wouldn't that leave any files which change without changing the size
marked as not changed? That would definitely not be good enough in my eyes.
The chances that a file changes without changing its size are much greater
than its mtime changing without the file changing (how does that happen anyway?)

Seb
Received on Thu Nov 9 22:30:03 2006

This message: [ Message body ]
Next message: Olaf van der Spek: "Re: svnserve.conf: anon-access = none (instead of read)"
Previous message: Daniel Rall: "Colon in zh_CN Subversion translation file"
Next in thread: Erik Huelsmann: "Re: Re: Re: Caching text-size in the entries file"
Reply: Erik Huelsmann: "Re: Re: Re: Caching text-size in the entries file"
Reply: Peter Samuelson: "Re: Re: Re: Caching text-size in the entries file"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]