Hello,
Through a double sshed connection (home PC => firewall => dev server),
I made a dump of all a repository from my dev server with "svnadmin
dump repos > file".
"Back to" my home computer, I've been surprized to see that my dump
file contained bad encoded UTF-8 characters like the following
(see svn:log property) :
Revision-number: 1 Prop-content-length: 146 Content-length:
146
K 7
svn:log
V 46
Création de l'arborescence de base du dépôt
K 10
svn:author
V 5
fredo
K 8
svn:date
V 27
2009-04-07T19:59:37.972139Z
PROPS-END
These bad characters appeared either in svn:log properties or files content.
All three computers have UTF-8 locales, and ssh clients and servers have
SendEnv and AcceptEnv setted to LC_* and LANGUAGE.
Back to the dev server I have made some tests, and it seems to me that
encoding errors are due to the presence of binary files in the dump, eg
files with svn:mime-type property set to application/octet-stream.
For example, my django project contains pure plain text in
'trunk/templates' and images in 'trunk/media/images' :
dev-server~:$ svnadmin dump -r 33
/var/svn/enseignements-dev.ehess.fr/ |\
svndumpfilter include 'trunk/templates' > \
/tmp/svn_enseignements_r33_nobinary.dump
It's output through xxd is something like that (Année on the third line
contains c3a9 sequence which is the utf-8 code for the "french" é):
001b450: 6872 6566 3d22 2f7b 7b20 616e 6e65 6575
href="/{{ anneeu
001b460: 6e69 762e 616e 6e65 6520 7d7d 2f22 2074
niv.annee }}/" t
001b470: 6974 6c65 3d22 416e 6ec3 a965 2075 6e69
itle="Ann..e uni
001b480: 7665 7273 6974 6169 7265 207b 7b20 616e
versitaire {{ an
001b490: 6e65 6575 6e69 7620 7d7d 223e 7b7b 2061
neeuniv }}">{{ a
001b4a0: 6e6e 6565 756e 6976 207d 7d3c 2f61 3e3c
nneeuniv }}</a><
001b4b0: 2f6c 693e 0a20 2020 2020 203c 6c69 3e3c
/li>. <li><
dev-server~:$ svnadmin dump -r 33
/var/svn/enseignements-dev.ehess.fr/ | 2>&1
svndumpfilter include 'trunk/templates' include
'trunk/media/images' >
/tmp/svn_enseignements_r33.dump
On the third line, the 'é' letter is made of four bytes the two é
characters) :
0000000: 6872 6566 3d22 2f7b 7b20 616e 6e65 6575
href="/{{ anneeu
0000010: 6e69 762e 616e 6e65 6520 7d7d 2f22 2074
niv.annee }}/" t
0000020: 6974 6c65 3d22 416e 6ec3 83c2 a965 2075
itle="Ann....e u
0000030: 6e69 7665 7273 6974 6169 7265 207b 7b20
niversitaire {{
0000040: 616e 6e65 6575 6e69 7620 7d7d 223e 7b7b
anneeuniv }}">{{
0000050: 2061 6e6e 6565 756e 6976 207d 7d3c 2f61
anneeuniv }}</a
0000060: 3e3c 2f6c 693e 0a20 2020 2020 203c 6c69
></li>. <li
0000070: 3e43 6f6d 7074 6520 7265 6e64 753c 2f6c
>Compte rendu</l
0000080: 0a
Some of you have an idea about this ?
The only solution I could issue is to delete binary files from the
repos,...
Many thanks in advance and forgive me if I am totally wrong.
Frédéric
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2175222
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-05-10 21:47:10 CEST