Re: Caching text-size in the entries file

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: 2006-11-06 16:45:52 CET

On 11/6/06, Peter Lundblad <plundblad@google.com> wrote:
> Erik Huelsmann writes:
> > Ah, we already have a speed-vs-correctness tradeoff in status: we use
> > mtimes instead of filecomparison or hash calculation. BTW: I do feel
> Yeah, but I think we need to be careful when making the heuristic worse.

Right, but what I hope to do is reduce the number of false negatives:
I assert most files which are modified (but do not have their
timestamp changed) won't have modified keywords or eols only. Rather:
I think having modified eols-and-keyword-expansions-only is an edge
case rather unlikely to happen. Whereas we have seen cases where
timestamps were kept constant (making the changes undetected).

Currently our algorithm doesn't know any false positives, but it has a
chance for false negatives. I'd rather see that the other way around:
false positives are correctable with an 'svn revert' or 'svn cleanup';
false negatives don't have a cure.

(I believe this is the semantic part.)

> > that some speed improvements are worth more than others; this one is a
> > factor 18 for translated files. I think that's a lot to gain. (Just
> > yesterday had I a user complaining on IRC about our speed in the
> > presence of keywords...)
> >
> Is this when most (all?) of the files are modified?

Well, that was the testing scenario, but I can't imagine that the
statistic wouldn't hold on a per-file basis :-)

> > > Also, I think this has a compatibility impact. Currently, if you
> > > call the status API and that returns the file as modified, then commit will
> > > include it in the transaction. Users of the library might rely on this
> > > semantic. If you have files that look modified before a commit, will
> > > you still have those files look modified after the commit because they
> > > were not considered modified by commit? Maybe the commit code could
> > > detect and fix this?
> >
> > That's no problem. Ofcourse it could do that (post-commit!). But how
> > about those files that are currently modified but look unmodified
> > because of detranslation? We don't do anything about those, do we?
> >
> Do you mean modification that would be "normalized away" in the repository?

Exactly.

> > But, not all files reported as not-modified are not modified: we even
> > have a test which creates an inconsistent new line in a file (by
> > replacing CRLF with LF). Currently status says the file isn't
> > modified, but if you have software which only tolerates CRLF, it
> > *does* consider it modified.
> >
> I think here is where our views differ. I have always assumed that
> modified as reported by status means modified compared to the base
> revision, not compared to what the text base looked like when it was
> checked out. This means that a file that status says is modified will
> look different if committed, compared to the base revision.

I agree with the basic principle, but have a hard time accepting we'd
have an algorithm which only knows false negatives, when we don't have
a way of dealing with them, while false positives would be 'cureable',
but we don't generate any of those.

> > Why I think this change is important is: I think we should report
> > 'Modified' as close as we can to the file actually having been
> > modified (as viewed by the user).
> >
> So, is this a speed improvement or a semantic improvement?

Well, I didn't mean for it to be a semantic change, but I guess we end
up with one, yes. I meant for it to be a speed improvement.

> If we implement
> this change, we need to make sure we are consistent. Either a file
> with mods inside keyword expanded parts is modified or not according
> to status.

Agreed.

> This must not depend on whether the size is the same or
> not. I would prefer that we discuss the semantic change separately
> from the optimization. If we can agree that your proposed new
> semantics are correct, then we can discuss performance improvements.

Right, but what I'm seeing is that what status returns as modified
truely needs to be committed, but what it returns as *unmodified* does
not necessarily mean it wouldn'

> > Aren't we always going to confuse some users in some cases by doing
> > translation anyway?
> >
> Probably:-) But we could at least be consistent.
>
> Regards,
> //Peter
> \
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 6 16:53:33 2006

This message: [ Message body ]
Next message: Garrett Rooney: "Re: Setting up bindings-users@s.t.o?"
Previous message: Erik Huelsmann: "Re: Caching text-size in the entries file"
In reply to: Peter Lundblad: "Re: Caching text-size in the entries file"
Next in thread: Erik Huelsmann: "Re: Caching text-size in the entries file"
Reply: Erik Huelsmann: "Re: Caching text-size in the entries file"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]