[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svnsync UTF8 problem

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Tue, 11 Oct 2011 00:18:12 +0200

Srdan Dukic wrote on Tue, Oct 11, 2011 at 10:15:53 +1300:
> I'm not going to open a bug, I just wanted to search to see whether there
> were any existing issues and this is what I found:
>
> I went to the Subversion bug tracker and searched for the phrase
> "svn_utf__is_valid" which turned up three results, none of which seem to be
> related:
> http://subversion.tigris.org/issues/buglist.cgi?long_desc_type=fulltext&long_desc=svn_utf__is_valid
>
> I also did a search for the "svn_repos_validate_prop" function, but this
> didn't return any results:
> http://subversion.tigris.org/issues/buglist.cgi?long_desc_type=fulltext&long_desc=svn_repos_validate_prop
>

The function is called svn_repos__validate_prop() (with a double
underscore) in 1.7 and had a different name (and was file-private)
in 1.6.

> Am I searching correctly? Searching for the error message, returns a large
> number of errors, most of which seem irrelevant:
> http://subversion.tigris.org/issues/buglist.cgi?long_desc_type=fulltext&long_desc=Cannot+accept+%27svn%3Alog%27+property+because+it+is+not+encoded+in+UTF-8+
>

Search the users@ list archives. _Many_ people asked here about their
non-UTF-8 svn:log properties (and one also asked about his svn:author
properties).

> The one issue that I saw that might be related is
> http://subversion.tigris.org/issues/show_bug.cgi?id=3817. However, I think
> that we've established that the source property is in ASCII, which should be
> compatible with UTF-8.
>

Yes. The xxd and proplist dumps confirm that the properties are in
UTF-8 and LF linefeeds.

Now, what I'm asking is that you don't file a bug saying "Non-UTF-8
revprops are rejected". That's fine, we know about this, we decided
it's intentional. I do not object (in fact, I'll probably support)
filing bugs for situations where that error is raised even though all
the svn:* properties involved are in UTF-8 and LF linefeeds.

> Thank you again for your help, it's great to have someone who's working on
> the Subversion code looking at the problem.
>
> Thank you
> --
> Srdan Dukic
>

Happy to help, and hope I've clarified my position above. More below...

> On 11 October 2011 09:38, Srdan Dukic <srdan.dukic_at_gmail.com> wrote:
>
> > I had no intention of filing a bug until I had done as much debugging as
> > possible, not being a developer myself. Thank you for your advice about
> > where to look for the error in the source code. I can see that the error is
> > thrown in the 'svn_repos_validate_prop' function in the file you mentioned.
> > Specifically, the line is 182:
> >
> > /* Validate "svn:" properties. */ 176 if (svn_prop_is_svn_prop(name) &&
> > value != NULL) 177 { 178 /* Validate that translated props (e.g.,
> > svn:log) are UTF-8 with 179 * LF line endings. */ 180 if
> > (svn_prop_needs_translation(name)) 181 { 182 if
> > (svn_utf__is_valid(value->data, value->len) == FALSE) 183 { 184 return
> > svn_error_createf 185 (SVN_ERR_BAD_PROPERTY_VALUE, NULL, 186 _("Cannot
> > accept '%s' property because it is not encoded in " 187 "UTF-8"), name);
> > 188 }
> > So, it seems that the "svn_utf_is_valid" function is the one that is
> > rejecting this value. Would you happen to know where this function is
> > defined?
> >

libsvn_subr/utf.c. For future reference, ctags can answer this question
more efficiently than the mailing list.

That said: don't condense double underscores into a single one; that's
wrong in C and in our case also violates our naming convention.
(A double underscore indicates a non-public API symbol.)

> >
> > Thank you
> > --
> > Srdan Dukic
> >

Looking forward, the error message in your Apache log appears only in
one place in the source (that place is svn_repos__validate_prop() which
I already mentioned), so I'd suggest looking for the property or
revision property that triggers throwing the error there.

(Usually subversion/po/*.po can tell you where a given error message
appears in the source.)

HTH,

Daniel

> > On 11 October 2011 09:08, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> >
> >> Don't and don't.
> >>
> >> If you want to see what values cross the network, go to the validation
> >> function in libsvn_repos/fs-wrap.c that generates the error message you
> >> get.
> >>
> >> And I _am_ a developer, and I already asked you not to file a bug.
> >> Please don't until you have identified a problem we don't already know
> >> about. Thanks.
> >>
> >> Srdan Dukic wrote on Tue, Oct 11, 2011 at 08:59:38 +1300:
> >> > Thank you for your help. I'll try and turn on debugging on the dav_svn
> >> > module to see what actual values are being passed across the network and
> >> if
> >> > that doesn't turn up anything, I guess I'll just ask on the dev mailing
> >> list
> >> > or open a bug.
> >> >
> >> > Thanks again
> >> > --
> >> > Srdan Dukic
> >> >
> >> > On 11 October 2011 08:40, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> >> >
> >> > > Your revprops values are all ASCII and LF linefeeds, so r6107 should
> >> > > get committed to the mirror without issue. The --source-prop-encoding
> >> > > is new in 1.7.
> >> > >
> >> > > So, yes, if you're still seeing the error *while syncing r6107* (i.e.,
> >> > > mirror HEAD is r6106), I'm not really sure what's going on.
> >> > >
> >> > > Srdan Dukic wrote on Tue, Oct 11, 2011 at 08:29:04 +1300:
> >> > > > > Odd. Perhaps some other revision property of that revision
> >> contains
> >> > > > > non-UTF-8?
> >> > > > >
> >> > > >
> >> > > > The other revision properties are:
> >> > > >
> >> > > > # svn proplist --revprop -r 6107 http://subversion/project/Flow
> >> > > > Unversioned properties on revision 6107:
> >> > > > svn:log
> >> > > > svn:author
> >> > > > svn:date
> >> > > >
> >> > > >
> >> > > > > > > The actual value of the svn:log property is:
> >> > > > > > >
> >> > > > > > > "When printing a form through the full task list the client's
> >> TEF
> >> > > > > number
> >> > > > > > > has <B> and </B> beside it (for cds)."
> >> > > > > > >
> >> > > > > > > Which doesn't have any characters that should need any UTF8
> >> > > handling.
> >> > > > > > >
> >> > > > >
> >> > > > > svn propget --revprop -rN --strict svn:log | xxd
> >> > > > >
> >> > > >
> >> > > > When I look at the contents of the properties, they all seem to be
> >> in
> >> > > > regular ASCII:
> >> > > >
> >> > > > # svn propget --revprop -r 6107 --strict svn:author
> >> > > > http://subversion/project/Flow | xxd
> >> > > > 0000000: 5869 6e67 Xing
> >> > > >
> >> > > > # svn propget --revprop -r 6107 --strict svn:date
> >> > > > http://subversion/project/Flow | xxd
> >> > > > 0000000: 3230 3034 2d30 342d 3238 5430 313a 3331 2004-04-28T01:31
> >> > > > 0000010: 3a34 302e 3030 3030 3030 5a :40.000000Z
> >> > > >
> >> > > > # svn propget --revprop -r 6107 --strict svn:log
> >> > > > http://subversion/project/Flow | xxd
> >> > > > 0000000: 5768 656e 2070 7269 6e74 696e 6720 6120 When printing a
> >> > > > 0000010: 666f 726d 2074 6872 6f75 6768 2074 6865 form through the
> >> > > > 0000020: 2066 756c 6c20 7461 736b 206c 6973 7420 full task list
> >> > > > 0000030: 7468 6520 636c 6965 6e74 2773 204e 4849 the client's NHI
> >> > > > 0000040: 206e 756d 6265 7220 6861 7320 3c42 3e20 number has <B>
> >> > > > 0000050: 616e 6420 3c2f 423e 2062 6573 6964 6520 and </B> beside
> >> > > > 0000060: 6974 2028 666f 7220 6364 7329 2e it (for cds).
> >> > > >
> >> > > > > > Has anyone else had this problem? If so, how did you solve it?
> >> > > > > > >
> >> > > > >
> >> > > > > svnsync sync --source-prop-encoding
> >> > > > >
> >> > > > >
> >> > > > Is this a valid option to pass to svnsync? When I attempt to run
> >> svnsync
> >> > > > with this option, I get the following error:
> >> > > >
> >> > > > svnsync: invalid option: --source-prop-encoding
> >> > > >
> >> > > > The only other thing I can think of is that the original commit may
> >> have
> >> > > > been done on a windows machine where the locale was set to some
> >> > > non-english
> >> > > > value, but even then, the properties seem to all be in plain ASCII
> >> > > >
> >> > > > Thank you
> >> > > > --
> >> > > > Srdan Dukic
> >> > >
> >>
> >
Received on 2011-10-11 00:19:05 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.