[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: problem: svn status reports same files both "?" and "!" (encoding?, mac os x, svn 1.4.2)

From: B. Smith-Mannschott <benpsm_at_gmail.com>
Date: 2006-12-19 20:44:56 CET

On Dec 19, 2006, at 19:23, Kenneth Porter wrote:

> What are the file names in the shadow copy under .svn/text-base?
> Except for having ".svn-base" appended, the filenames should be
> identical. It sounds like they're not. Try "ls .svn/text-base >
> server-names" and "ls > local-names" and then using your favorite
> editor to compare the text.

They are identical, apart from the ".svn-base". For example, the Ä
in VJ_Änderungsübersicht.doc is encoded by "A" followed by the two
bytes 0xCC88, which is UTF-8 for "combining diaeresis" U+0308. The
Mac seems to report file names decomposed.

And, tada, .svn/entries records the same file name using the bytes
0xC384 for the Ä, which is UTF-8 for "latin capital letter A with
diaeresis" U+00C4.

It looks like svn is grabbing the UTF-8 bytes from the system API and
the UTF-8 bytes it's stashed away in its entries file and comparing
them in a unicode-ignorant fashion, i.e. byte-for-byte, without
considering such niceties as character composition.

On Dec 19, 2006, at 19:18, Ryan Schmidt wrote:
> It sounds like the composed/decomposed UTF-8 string problem to me.
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2464

Indeed, it does.

Now what?

// ben

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Dec 19 20:45:50 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.