[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: diff-optimizations-tokens branch: I think I'm going to abandon it

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Thu, 2 Dec 2010 21:21:20 +0100

On Thu, Dec 2, 2010 at 6:18 PM, Bill Tutt <bill_at_tutts.org> wrote:
> Note: This email only tangentially relates to svn diff and more about
> reverse token scanning in general:
> As someone who has implemented suffix reverse token scanning before:

Thanks for the input. It's nice to see other people have also
struggled with this :-).

> * It simply isn't possible in DBCS code pages. Stick to byte only here.
> áá SBCS and UTF-16 make reverse token stuff relatively
> straightforward. UTF-8 is a little trickier but still tractable.
> á At least UTF-8 is tractable in a way that DBCS isn't. You always
> know which part of a Unicode code point you are in. (i.e. byte 4 vs.
> byte 3 vs. etc...)

Ok, this further supports the decision to focus on the byte-based
approach. We'll only consider stuff identical if all bytes are
identical. That's the simplest route, and since it's only an
optimization anyway ...

> * I would recommend only supporting a subset of theádiff optionsáfor
> reverse token scanning. i.e. ignore whitespace/ignore eol but skip
> ignore case (if svn has that, I forget...)

svn diff doesn't have an ignore-case option, so that's ok :-).

> áá If tokens include keyword expansion operations then stop once you
> hit one. The possible source of bugs outways the perf gain in my mind
> here.

Haven't thought about keyword expansion yet. But as you suggest: I'm
not going to bother doing special stuff for (expanded) keywords. If we
find a mismatch, we'll stop with the optimized scanning, and fall back
to the default algorithm.

> * Suffix scanning does really require a seekable stream, if it isn't
> seekable then don't perform the reverse scanning.á It is only an
> optimization after all.

Hm, yes, we'll need to be careful about that. I'll start another mail
thread asking for known implementors of the svn_diff_fns_t functions,
to find out whether seeking around like that for suffix would be

> Additional ignore whitespace related comment:
> * IIRC, Perforce had an interesting twist on ignoring whitespace. You
> could ignore just line leading/ending whitespace instead of all
> whitespace differences but pay attention toáany whitespace change
> after the "trim" operation had completed.
> e.g.:
> * "ááá aaa bbbáá " vs "aaa bbb" would compare as equal
> * "ááá aaaá bbbá " vs "aaaá bbb" would compare as equal
> * "ááá aaaá bbbá " vs "aaaábbb" would compare as non-equal due to the
> white space change in the middle of the line

Cool (svn doesn't have that option). But I'm not sure what that would
be useful for (as a user, I can't immediately imagine an important use
case). Anyway, could still be a nice option...


Received on 2010-12-02 21:22:17 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.