[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: File and folder names corrupted when importing from CVS using cvs2svn

From: Branko Čibej <brane_at_apache.org>
Date: Thu, 18 Jan 2018 18:17:25 +0100

On 18.01.2018 17:51, Bo Berglund wrote:
> On Thu, 18 Jan 2018 17:38:04 +0100, Bo Berglund
> <bo.berglund_at_gmail.com> wrote:
>> I don't know from where this problem originates, either it is a flaw
>> in the cvs2svn script, the configuration of the conversion or in the
>> format of the generated dump files.
>> Otherwise it may be a problem when importing the dump files into the
>> VisualSVN server....
> I made a test by creating a new file in the working copy named as
> follows:
> Testing_Å_Ä_Ö_å_ä_ö.txt
> Then I added it and committed.
> Then I used the VisualSVN repository web browser and found the file
> with the correct name. So it seems like the conversion from CVS to Svn
> is where the screw-up is located...

AFAIR you did not convert your CVS repositories on the same machine that
you used as the CVS server, correct? So ... you may not have used the
same character encoding during conversion as during normal operations.
As a guess I'd say that your (Windows, CVSNT) server uss the Windows
Latin 1 ("Western") encoding, and your (Linux) machine where you did the
conversion uses UTF-8.

If that's the case, it's not surprising that accented characters were
converted improperly.

(FWIW, the hex codes you show are valid UTF-8 but the characters they
encode have no relation to the originals.)

> Still, what do I do now?

Two options:

  * If you don't care about history, just rename all the offending files
    in the repository to their proper names.
  * If you *do* care about history, repeat the conversion, using the
    correct locale settings, then use svnsync to bring the correctly
    converted repositories up to date.
      o Alternatively, edit the original dump files and fix the file
        names there (they have to be encoded in UTF-8) to avoid having
        to repeat the conversion from CVS.

The second option is going to be extremely tricky.

-- Brane
Received on 2018-01-18 18:17:33 CET

This is an archived mail posted to the Subversion Users mailing list.