Bhuvaneswaran A wrote:
> On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote:
> > Branko Čibej wrote:
> > > Bhuvaneswaran A wrote:
> > > > The failure message for few tests contain special characters, ex:
> >
> > What do you mean by "special" characters? Unprintable characters?
> > Non-UTF8 characters? Invalid XML characters? Characters that are XML
> > syntax characters such as "<"?
>
> When I mean special characters, I mean control characters, ex "^H".
> Refer to the attachment in issue 3541 for a sample character.
>
> > > > prop_tests.py. As a result, it creates an invalid xml file and not being
> > > > displayed in Hudson.
> > > >
> > > > This commit fixes this issue, also tracked in issue 3541. With this fix,
> > > > the test results are displayed in Hudson, especially the results
> > > > specific to 1.6.x solaris build.
> > > > http://subversion.tigris.org/issues/show_bug.cgi?id=3541
> > > >
> > > > Index: tools/dev/gen_junit_report.py
> > > > =======================================
> > > > --- tools/dev/gen_junit_report.py (revision 886204)
> > > > +++ tools/dev/gen_junit_report.py (working copy)
> > > > @@ -46,6 +46,16 @@
> > > > data = data.replace(char, encode[char])
> > > > return data
> > > > +def remove_special_characters(data):
> > > > + """remove special characters in test failure reasons"""
> > > > + if not data:
> > > > + return data
> > > > + chars_table = "".join([chr(n) for n in xrange(256)])
> > > > + # remove all special characters upto ascii value 31, except line
> > > > feed (10)
> > > > + # and carriage return (13)
> > > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] +
> > > > chars_table[14:31]
> >
> > Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32].
>
> As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ...
> [11:12] ... [14:31] works for me.
In array slicing, the range start is inclusive but the end is exclusive,
so [11:12] just gives the single character [11], whereas [11:13] would
give [11] and [12] which is what you want.
> > > Also, wouldn't it be more proper to find out why the tests put control
> > > characters in the failure description than to just blindly throw them away?
> >
> > Or just escape the "special" characters.
>
> Good point. I used to encode using utf-8, but it doesn't seem to
> detect/encode these characters, resulting in unchanged behaviour. I used
> something like:
>
> reason = u'%s'.encode('utf-8') % reason
> reason = unicode(reason, 'utf-8')
Those statements are just converting from UTF-8 to internal Unicode
representation, not doing any escaping. Control characters (and all
characters) are valid in both UTF-8 and internal Unicode
representations, so no escaping is needed for the conversion.
I searched on the web and didn't find a really really simple way to
escape a set of characters. I think something like
for c in chars_to_remove:
data = data.replace(c, '%%%0x' % ord(c))
would do it.
- Julian
Received on 2009-12-04 12:20:09 CET