Re: Caching text-size in the entries file

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: 2006-11-06 16:48:55 CET

On 11/6/06, Erik Huelsmann <ehuels@gmail.com> wrote:
> On 11/6/06, Peter Lundblad <plundblad@google.com> wrote:
> > Erik Huelsmann writes:
> > > Ah, we already have a speed-vs-correctness tradeoff in status: we use
> > > mtimes instead of filecomparison or hash calculation. BTW: I do feel
> > Yeah, but I think we need to be careful when making the heuristic worse.
>
> Right, but what I hope to do is reduce the number of false negatives:
> I assert most files which are modified (but do not have their
> timestamp changed) won't have modified keywords or eols only. Rather:
> I think having modified eols-and-keyword-expansions-only is an edge
> case rather unlikely to happen. Whereas we have seen cases where
> timestamps were kept constant (making the changes undetected).
>
> Currently our algorithm doesn't know any false positives, but it has a
> chance for false negatives. I'd rather see that the other way around:
> false positives are correctable with an 'svn revert' or 'svn cleanup';
> false negatives don't have a cure.
>
> (I believe this is the semantic part.)
>
> > > that some speed improvements are worth more than others; this one is a
> > > factor 18 for translated files. I think that's a lot to gain. (Just
> > > yesterday had I a user complaining on IRC about our speed in the
> > > presence of keywords...)
> > >
> > Is this when most (all?) of the files are modified?
>
> Well, that was the testing scenario, but I can't imagine that the
> statistic wouldn't hold on a per-file basis :-)
>
> > > > Also, I think this has a compatibility impact. Currently, if you
> > > > call the status API and that returns the file as modified, then commit will
> > > > include it in the transaction. Users of the library might rely on this
> > > > semantic. If you have files that look modified before a commit, will
> > > > you still have those files look modified after the commit because they
> > > > were not considered modified by commit? Maybe the commit code could
> > > > detect and fix this?
> > >
> > > That's no problem. Ofcourse it could do that (post-commit!). But how
> > > about those files that are currently modified but look unmodified
> > > because of detranslation? We don't do anything about those, do we?
> > >
> > Do you mean modification that would be "normalized away" in the repository?
>
> Exactly.
>
> > > But, not all files reported as not-modified are not modified: we even
> > > have a test which creates an inconsistent new line in a file (by
> > > replacing CRLF with LF). Currently status says the file isn't
> > > modified, but if you have software which only tolerates CRLF, it
> > > *does* consider it modified.
> > >
> > I think here is where our views differ. I have always assumed that
> > modified as reported by status means modified compared to the base
> > revision, not compared to what the text base looked like when it was
> > checked out. This means that a file that status says is modified will
> > look different if committed, compared to the base revision.
>
> I agree with the basic principle, but have a hard time accepting we'd
> have an algorithm which only knows false negatives, when we don't have
> a way of dealing with them, while false positives would be 'cureable',
> but we don't generate any of those.
>
> > > Why I think this change is important is: I think we should report
> > > 'Modified' as close as we can to the file actually having been
> > > modified (as viewed by the user).
> > >
> > So, is this a speed improvement or a semantic improvement?
>
> Well, I didn't mean for it to be a semantic change, but I guess we end
> up with one, yes. I meant for it to be a speed improvement.
>
> > If we implement
> > this change, we need to make sure we are consistent. Either a file
> > with mods inside keyword expanded parts is modified or not according
> > to status.
>
> Agreed.
>
> > This must not depend on whether the size is the same or
> > not. I would prefer that we discuss the semantic change separately
> > from the optimization. If we can agree that your proposed new
> > semantics are correct, then we can discuss performance improvements.
>
> Right, but what I'm seeing is that what status returns as modified
> truely needs to be committed, but what it returns as *unmodified* does
> not necessarily mean it wouldn'

sorry about that, I wasn't finished.

"not necessarily wouldn't be committed when passed as an argument to
'svn ci -m "" --force <target>". Isn't status there to help determine
what should and shouldn't (probably) be committed?

> > > Aren't we always going to confuse some users in some cases by doing
> > > translation anyway?
> > >
> > Probably:-) But we could at least be consistent.

To me, it looks like we have a gap (the heuristic vs full file
comparison) between what 'svn ci --force' should do and what 'svn st'
does. How can that be consistent?

bye,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 6 16:49:15 2006

This message: [ Message body ]
Next message: Erik Huelsmann: "Re: Caching text-size in the entries file"
Previous message: Marcus Rueckert: "Re: [HCoop-Discuss] SVN security issues"
In reply to: Erik Huelsmann: "Re: Caching text-size in the entries file"
Next in thread: Michael Haggerty: "Re: Caching text-size in the entries file"
Reply: Michael Haggerty: "Re: Caching text-size in the entries file"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]