[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Unicode, dump files, and file names

From: Kenneth Porter <shiva_at_sewingwitch.com>
Date: 2006-07-29 04:29:29 CEST

I'm assisting with development of the vss2svn conversion program for
converting a Visual Source Safe repository to a Subversion dump file.

<http://www.pumacode.org/projects/vss2svn/>

In my VSS repo I have some source files that include character 0x85
(ellipsis) in the name. A typical filename is "Move to Point...-D.bmp",
where the "..." is the single-byte CP1252 character 0x85.

<http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT>

vss2svn extracts the filename from CP1252-encoded VSS repository DB and
writes it to a UTF-8-encoded XML file.

What should we put in the dump file for the file name? Should it be
UTF-8-encoded? What does "svnadmin load" expect? (I'm seeing gibberish in
the filename in the resulting WC, and I suspect that double-encoding is
happening somewhere in the conversion process.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jul 29 04:30:02 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.