#define MBE multi-byte encoding
#defien SBE single-byte encoding
Stefan Sperling wrote on Tue, May 31, 2011 at 01:07:02 +0200:
> On Tue, May 31, 2011 at 01:41:54AM +0300, Daniel Shahaf wrote:
> > How would you handle a repository that contains the following
> > nodes/fspaths:
> >
> > /foo/bår (in UTF-8)
> > /foo/bår (in latin1)
> >
> > ?
> >
> >
> > How would you handle a repository that contains:
> > /foo/barÉ (in latin1)
> > /foo/barŠ (in latin2)
> >
> > ?
>
> All the ISO-8859 (latin) encodings are single-byte encodings.
> It's not possible to know what the encoding is supposed to be if
> paths in different ISO-8859 encodings entered the repository.
> They all decode to different but valid strings of characters.
>
> In the first iteration of this feature I would simply assume one
> user-specified source encoding and try to convert data that isn't
> UTF-8 from the source encoding to UTF-8.
> In case multiple single-byte encodings are present this means that some
> characters will be wrong but the repository will work again without
> manual intervention. In case multiple multi-byte encodings other than
> UTF-8 are present this approach can fail and might require manual fixing
> (no worse than the current situation).
> This could still be improved upon if necessary.
True, I had overlooked these points.
One thing that jumps to mind is to have a list of encodings to
try --- i.e.,
svnadmin load --recode-paths-from=MBE1,MBE2,SBE
would attempt to interpret paths as UTF-8, failing that as MBE1, failing
that as MBE2, failing that as SBE.
(I know you use vim, so: compare the 'fencs' option in vim).
Received on 2011-05-31 01:17:37 CEST