[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

svnadmin dump - Erroneous UTF-8 encoding with binary files

From: Frédéric Hébert <fg.hebert_at_gmail.com>
Date: Sun, 10 May 2009 21:21:41 +0200

Hello,

 Through a double sshed connection (home PC => firewall => dev server),
 I made a dump of all a repository from my dev server with "svnadmin
 dump repos > file".

 "Back to" my home computer, I've been surprized to see that my dump
 file contained bad encoded UTF-8 characters like the following
(see svn:log property) :

        Revision-number: 1 Prop-content-length: 146 Content-length:
        146

        K 7
        svn:log
        V 46
        Création de l'arborescence de base du dépôt
        K 10
        svn:author
        V 5
        fredo
        K 8
        svn:date
        V 27
        2009-04-07T19:59:37.972139Z
        PROPS-END

 
These bad characters appeared either in svn:log properties or files content.

All three computers have UTF-8 locales, and ssh clients and servers have
SendEnv and AcceptEnv setted to LC_* and LANGUAGE.

Back to the dev server I have made some tests, and it seems to me that
encoding errors are due to the presence of binary files in the dump, eg
files with svn:mime-type property set to application/octet-stream.
For example, my django project contains pure plain text in
'trunk/templates' and images in 'trunk/media/images' :

    dev-server~:$ svnadmin dump -r 33
/var/svn/enseignements-dev.ehess.fr/ |\
                 svndumpfilter include 'trunk/templates' > \
                 /tmp/svn_enseignements_r33_nobinary.dump

It's output through xxd is something like that (Année on the third line
contains c3a9 sequence which is the utf-8 code for the "french" é):

                001b450: 6872 6566 3d22 2f7b 7b20 616e 6e65 6575
href="/{{ anneeu
                001b460: 6e69 762e 616e 6e65 6520 7d7d 2f22 2074
niv.annee }}/" t
                001b470: 6974 6c65 3d22 416e 6ec3 a965 2075 6e69
itle="Ann..e uni
                001b480: 7665 7273 6974 6169 7265 207b 7b20 616e
versitaire {{ an
                001b490: 6e65 6575 6e69 7620 7d7d 223e 7b7b 2061
neeuniv }}">{{ a
                001b4a0: 6e6e 6565 756e 6976 207d 7d3c 2f61 3e3c
nneeuniv }}</a><
                001b4b0: 2f6c 693e 0a20 2020 2020 203c 6c69 3e3c
/li>. <li><

    dev-server~:$ svnadmin dump -r 33
/var/svn/enseignements-dev.ehess.fr/ | 2>&1
                 svndumpfilter include 'trunk/templates' include
'trunk/media/images' >
                 /tmp/svn_enseignements_r33.dump

On the third line, the 'é' letter is made of four bytes the two é
characters) :

                0000000: 6872 6566 3d22 2f7b 7b20 616e 6e65 6575
href="/{{ anneeu
                0000010: 6e69 762e 616e 6e65 6520 7d7d 2f22 2074
niv.annee }}/" t
                0000020: 6974 6c65 3d22 416e 6ec3 83c2 a965 2075
itle="Ann....e u
                0000030: 6e69 7665 7273 6974 6169 7265 207b 7b20
niversitaire {{
                0000040: 616e 6e65 6575 6e69 7620 7d7d 223e 7b7b
anneeuniv }}">{{
                0000050: 2061 6e6e 6565 756e 6976 207d 7d3c 2f61
anneeuniv }}</a
                0000060: 3e3c 2f6c 693e 0a20 2020 2020 203c 6c69
></li>. <li
                0000070: 3e43 6f6d 7074 6520 7265 6e64 753c 2f6c
>Compte rendu</l
                0000080: 0a

Some of you have an idea about this ?
The only solution I could issue is to delete binary files from the
repos,...
 
Many thanks in advance and forgive me if I am totally wrong.

Frédéric

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2175222

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-05-10 21:47:10 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.