[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: About character encoding of the text files

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Thu, 26 Aug 2010 10:12:12 -0400

On 08/26/2010 09:29 AM, Mark Phippard wrote:
> On Thu, Aug 26, 2010 at 9:27 AM, C. Michael Pilato <cmpilato_at_collab.net> wrote:
>> [And just wrap this up from the ViewVC side of things]
>> As I saw this thread returning the old endorsement of tossing encoding
>> information into the svn:mime-type property, I went ahead and taught ViewVC
>> to look there for that information. So, Ivan, if you missed my commit to
>> ViewVC yesterday, the trunk and 1.1.x branch tip code will parse
>> svn:mime-type, extract the charset= bit, and pass it's value off to Pygments
>> when doing syntax highlighting for the markup and annotate views.
> Will this fix the problem? Isn't there still the problem that the
> page advertises its encoding to the browser as UTF-8? Does ViewVC
> convert from the encoding in the mime-type to UTF-8 before sending the
> content to the browser? Or is that what Pygments is doing for you?

File contents and encoding come into play in the following places in ViewVC:

  - the checkout (or download) view
  - the markup/annotate view
  - the diff view

The checkout view is a direct repository dump of the file contents without
any ViewVC manipulation, and has since 1.1.0 been able to present to the
browser the svn:mime-type property as-is, encoding value and all. (Meaning,
it all works.) My recent changes should have no visible effect on the
result here, though the svn:mime-type property is now parsed and the
Content-type of the response reconstructed -- a less direct route for that data.

The markup/annotate view optionally employs Pygments, and since 1.1.2 has
been coded to use the 'chardet' optional Python library to guess at file
content encodings for the purpose of conversion to UTF-8. But guessing is
an inexact science, and it's possible that Pygments doesn't like when you
provide it a mime-type value that has parameters (such as 'charset')
attached. Anyway, my changes now provide Pygments with the user-specified
(via svn:mime-type) encoding directly for that UTF-8 conversion. Of course,
if you aren't using Pygments, then today you still get nothing. (I'd like
to fix this by making the fallback code use 'chardet' directly or something.)

Finally, the diff view has always been at a loss for anything decent in this
space, and that remains the case today. I've been wanting to explore the
use of Pygments/chardet for this view, too, but I lack Round Tuits.

C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2010-08-26 16:13:08 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.