Re: diff-optimizations-tokens branch: I think I'm going to abandon it

From: Bill Tutt <bill_at_tutts.org>
Date: Thu, 2 Dec 2010 12:18:20 -0500

Note: This email only tangentially relates to svn diff and more about
reverse token scanning in general:

As someone who has implemented suffix reverse token scanning before:

* It simply isn't possible in DBCS code pages. Stick to byte only here.
SBCS and UTF-16 make reverse token stuff relatively
straightforward. UTF-8 is a little trickier but still tractable.
At least UTF-8 is tractable in a way that DBCS isn't. You always
know which part of a Unicode code point you are in. (i.e. byte 4 vs.
byte 3 vs. etc...)

* I would recommend only supporting a subset of the diff options for
reverse token scanning. i.e. ignore whitespace/ignore eol but skip
ignore case (if svn has that, I forget...)
If tokens include keyword expansion operations then stop once you
hit one. The possible source of bugs outways the perf gain in my mind
here.
* Suffix scanning does really require a seekable stream, if it isn't
seekable then don't perform the reverse scanning. It is only an
optimization after all.

Additional ignore whitespace related comment:
* IIRC, Perforce had an interesting twist on ignoring whitespace. You
could ignore just line leading/ending whitespace instead of all
whitespace differences but pay attention to any whitespace change
after the "trim" operation had completed.

e.g.:
* "    aaa bbb   " vs "aaa bbb" would compare as equal
* "    aaa bbb " vs "aaa bbb" would compare as equal
* "    aaa bbb " vs "aaa bbb" would compare as non-equal due to the
white space change in the middle of the line

Fyi,
Bill
Received on 2010-12-02 18:19:00 CET

This message: [ Message body ]
Next message: C. Michael Pilato: "Re: diff-optimizations-tokens branch: I think I'm going to abandon it"
Previous message: Stefan Sperling: "Re: 1.5.8 up for signing/testing"
In reply to: Julian Foad: "Re: diff-optimizations-tokens branch: I think I'm going to abandon it"
Next in thread: C. Michael Pilato: "Re: diff-optimizations-tokens branch: I think I'm going to abandon it"
Reply: C. Michael Pilato: "Re: diff-optimizations-tokens branch: I think I'm going to abandon it"
Reply: Johan Corveleyn: "Re: diff-optimizations-tokens branch: I think I'm going to abandon it"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]