[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Handle special characters in Junit report

From: Bhuvaneswaran A <bhuvan_at_collab.net>
Date: Fri, 04 Dec 2009 17:49:34 +0530

Please find attached the revised patch. I incorporated following
feedback:
  a) Fix the array slicing part
  b) Escape using ord() instead of removing those characters
  c) Handle "]]>" in CDATA section
  d) Define the ascii table globally (once) and re-use

I also verified this fix by generating the junit files for tests having
special characters and simulating a test that has "]]>" in failure text.
With this patch, it generates valid junit file.

[[
Fix for issue 3541. When generating junit reports, escape special
characters, if any, from test failure messages.

* tools/dev/gen_junit_report.py
  (escape_special_characters): New method to escape special characters.
  (junit_testcase_fail, junit_testcase_xfail): Use
  escape_special_characters() method to escape special characters in
  test failure messages.
]]
On Fri, 2009-12-04 at 11:19 +0000, Julian Foad wrote:
> Bhuvaneswaran A wrote:
> > On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote:
> > > Branko Čibej wrote:
> > > > Bhuvaneswaran A wrote:
> > > > > The failure message for few tests contain special characters, ex:
> > >
> > > What do you mean by "special" characters? Unprintable characters?
> > > Non-UTF8 characters? Invalid XML characters? Characters that are XML
> > > syntax characters such as "<"?
> >
> > When I mean special characters, I mean control characters, ex "^H".
> > Refer to the attachment in issue 3541 for a sample character.
> >
> > > > > prop_tests.py. As a result, it creates an invalid xml file and not being
> > > > > displayed in Hudson.
> > > > >
> > > > > This commit fixes this issue, also tracked in issue 3541. With this fix,
> > > > > the test results are displayed in Hudson, especially the results
> > > > > specific to 1.6.x solaris build.
> > > > > http://subversion.tigris.org/issues/show_bug.cgi?id=3541
> > > > >
> > > > > Index: tools/dev/gen_junit_report.py
> > > > > =======================================
> > > > > --- tools/dev/gen_junit_report.py (revision 886204)
> > > > > +++ tools/dev/gen_junit_report.py (working copy)
> > > > > @@ -46,6 +46,16 @@
> > > > > data = data.replace(char, encode[char])
> > > > > return data
> > > > > +def remove_special_characters(data):
> > > > > + """remove special characters in test failure reasons"""
> > > > > + if not data:
> > > > > + return data
> > > > > + chars_table = "".join([chr(n) for n in xrange(256)])
> > > > > + # remove all special characters upto ascii value 31, except line
> > > > > feed (10)
> > > > > + # and carriage return (13)
> > > > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] +
> > > > > chars_table[14:31]
> > >
> > > Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32].
> >
> > As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ...
> > [11:12] ... [14:31] works for me.
>
> In array slicing, the range start is inclusive but the end is exclusive,
> so [11:12] just gives the single character [11], whereas [11:13] would
> give [11] and [12] which is what you want.
>
> > > > Also, wouldn't it be more proper to find out why the tests put control
> > > > characters in the failure description than to just blindly throw them away?
> > >
> > > Or just escape the "special" characters.
> >
> > Good point. I used to encode using utf-8, but it doesn't seem to
> > detect/encode these characters, resulting in unchanged behaviour. I used
> > something like:
> >
> > reason = u'%s'.encode('utf-8') % reason
> > reason = unicode(reason, 'utf-8')
>
> Those statements are just converting from UTF-8 to internal Unicode
> representation, not doing any escaping. Control characters (and all
> characters) are valid in both UTF-8 and internal Unicode
> representations, so no escaping is needed for the conversion.
>
> I searched on the web and didn't find a really really simple way to
> escape a set of characters. I think something like
>
> for c in chars_to_remove:
> data = data.replace(c, '%%%0x' % ord(c))
>
> would do it.
>
> - Julian
>
>

-- 
Bhuvaneswaran A    
CollabNet Software P Ltd.  |  www.collab.net

Received on 2009-12-04 13:23:06 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.