[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: "Save to Clipboard" including BOM marker

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: Wed, 15 Jul 2015 20:58:16 +0200

On 15.07.2015 19:38, Eric Hirst wrote:
> We’ve been having a problem that I initially attributed to our code
> review tool (Crucible/Fisheye) but which now seems to be a TSVN bug.
> Repro steps are very simple:
> 1.Compare 2 attached .patch files in TortoiseMerge and notice the BOM in
> the one from the clipboard.
> 1.Check in a file similar to IPaintable.cs (attached) into your SVN
> repository. (I think any .cs file will work.)
> 2.Make an edit to one of the first couple lines (see IPaintable –
> revised.cs) and save the file locally
> 3.Open the .cs file in the TSVN commit dialog.
> 4.Right-click to “Create Patch”
> 5.(a) Save as a file (to get the attached IPaintable.cs.patch)
> (b) Choose the “Save to clipboard” option, pasting the result into
> Notepad or similar, then saving to a file (to get the attached
> IPaintable.cs.Clipboard.patch).
> 6.Diff the two patch files using TortoiseMerge and notice that the
> clipboard version includes the beginning of file byte-order marker (BOM).
> ERROR: this BOM breaks 3^rd party tools, including Crucible, which has a
> “paste from clipboard” feature.
> 7.Revert the change to the .cs file and try applying either of the 2
> patches.
> ERROR: the patch built from the clipboard paste does not work in
> TortoiseSVN either. An error box seems to flash up, but I can’t read
> anything it might be trying to tell me.
> This ends up adversely affecting about 25% of our code reviews,
> preventing iterative reviews and breaking Crucible’s ability to show the
> full context of changes. Our actual workflow is a variant of 5b, the
> difference being that we paste directly into Crucible instead of into
> Notepad. My question on
> https://answers.atlassian.com/questions/21627643, posted yesterday when
> I thought this was a Crucible bug, describes that workflow.
> Now that we understand the issue, we can add extra steps to our workflow
> and save patches as files before uploading, rather than simply using the
> TSVN “Save to clipboard” à Crucible “Paste” approach that we use today.

I can reproduce the problem. But the bug is not in TSVN. I'll try to

When you copy the patch to the clipboard, it is copied *exactly* the
same as if it was written to a file. You can test that yourself with a
clipboard spy utility if you like. The BOM is written as three bytes
0xEF 0xBB 0xBF.
The problem occurs if you paste that into an editor: the editor usually
assumes the pasted content is encoded in ASCII or UTF8. So when you
paste that content, the editor does not understand the BOM bytes and
converts them according the the codepage that's set. And when you then
save the patch, the BOM is written wrong.
Depending on the code page of the editor, it gets changed differently. I
tried three editors with different settings, and I was able to get the
BOM converted to two different byte sequences.

When the editor is set to ASCII/ANSI, the BOM is converted to a simple
0x3F. When the editor however is set to UTF8, the BOM is converted to
0xC3 0xAF 0xC2 0xBB 0xC2 0xBF - which is what your
IPaintable.cs.Clipboard.patch file contains.

The real problem is that the BOM in a patch file does not appear at the
start of the content but in the middle where the diffed line is shown,
and most editors can not handle that properly. Because for text files,
the BOM only appears either at the beginning, or if in the middle should
be treated as the "Word Joiner" character.

Another problem with the clipboard itself is that any text copied to it
is automatically converted to unicode as well (if you copy text as
CF_TEXT, windows will automatically convert it and fill the
CF_UNICODETEXT as well, and vice versa). If then the editor uses the
CF_UNICODETEXT clipboard format, the BOM is already converted (wrongly)
to 0xEF 0x00 0xBB 0x00 0xBF 0x00 instead of 0xFF 0xFE. Windows will also
fill in the CF_OEMTEXT format, which converts the BOM to the garbage
0x8B 0xAF 0xA8.

I can try to set both clipboard formats CF_TEXT *and* CF_UNICODETEXT
myself when copying the patch file to the clipboard - that will make the
CF_UNICODETEXT have the proper BOM bytes. But I'm not sure if that would
fix the issue you're seeing.
If you want, you can try the next nightly build (> r26611) from here:
Either the 1.8.x build or better yet the 'latest' one.


   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest interface to (Sub)version control
    /_/   \_\     http://tortoisesvn.net
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2015-07-15 20:58:22 CEST

This is an archived mail posted to the TortoiseSVN Users mailing list.