[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC/PATCH] commit messages not 8-bit compatible

From: Marcus Comstedt <marcus_at_mc.pp.se>
Date: 2002-05-29 22:01:16 CEST

=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <brane@xbc.nu> writes:

> >Hm. Any particular reason? Apart from breaking the "all strings
> > passed to libsvn_* shall be UTF-8"-paradigm,
> I never heard of such a paradigm. Yes, all _paths_ passed to the svn
> libraries should be UTF-8 and canonicalized, but not all strings. I'm
> sorry I didn't notice that berfore in your patches. AFAIK we never
> discussed canonicalizing anything but paths.

says "all arguments", not just paths.

If there is supposed to be different charset encodings for paths and
non-paths, then things will get very messy. For starters, there is
plenty of transfers inside the libs between paths and non-paths. Just
think of all the error messages including paths. Then we get to more
tricky problems like the svn:ignore property. Two cases:

A) svn:ignore is not stored at the server as UTF-8.

  Now the interpretation of svn:ignore is not fixed. But the
  interpretation of pathnames is fixed. Thus different files will be
  ignored for different users. Bad.

B) svn:ignore is stored at the server as UTF-8, but the string passed
   to svn_client_propset is not UTF-8 (since it's a property value,
   not a path). Then somewhere there must be a recoding heuristic
   based on the property name. Ugly.

Having all strings have the same encoding makes things much more clean
and simple.

> >it would mean that two
> >persons, one using a Latin-1 charset and one using an UTF-8 charset,
> >wouldn't be able to properly read each others log messages even if
> >they are restricting themselves to the common subset of characters.
> >
> >Since there are no properties on log messages, how do you propose that
> >the actual character encoding for a log message be recorded?
> >
> I'd say that problem should be solved by project policy, not by
> Subversion. Just like we don't require a particular repository layout.

Project policy doesn't affect the way my shell renders characters, so
it doesn't really solve the problem at all. And the idea here is not
to enforce more, but less. Since the strings are recoded, the end
user can use any character encoding he likes, rather than being stuck
with a "project policy" dictating it for him.

  // Marcus

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:23:48 2002

This is an archived mail posted to the Subversion Dev mailing list.