[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Issue #2748: non-UTF-8 filenames in the repository

From: Jens Seidel <jensseidel_at_users.sourceforge.net>
Date: Wed, 17 Sep 2008 15:52:54 +0200

On Wed, Sep 17, 2008 at 04:46:38PM +0300, Daniel Shahaf wrote:
> Stefan Sperling wrote on Wed, 17 Sep 2008 at 15:17 +0200:
> > On Wed, Sep 17, 2008 at 03:42:21PM +0300, Daniel Shahaf wrote:
> > > Patch for issue #2748 ("clients can create non UTF-8 filenames in the
> > > repository"). It prevents non-UTF-8 paths from entering the repository,
> > > but it doesn't affect existing files in existing repositories, svnsync,
> > > or 'svnadmin load'.
> > >
> > > I'm running 'make check' now, and I'll commit later today or tomorrow if
> > > I don't hear anything.
> >
> > Drive-by question:
> >
> > I thought we always handle strings in UTF-8 internally.
> > So why not just make svn_path_check_valid check for UTF-8 by default?
> >
>
> My original reason was that the current contract actively promises that
> it *won't* check validity:
>
> * ASSUMPTION: @a path is a valid UTF-8 string. This function does
> * not check UTF-8 validity.

There is *no* way to determine the encoding of a path. As far as I know
most Linux filesystems such as ext2/3 do not have a encoding field in
inodes or consider path as UTF-8.

If you create a filename in latin1 encoding containing 8bit characters
(so it may be an invalid UTF8 sequence) it's still possible that the
some path is later interpreted as UTF8 (because the locale setting
changed (e.g. by another user)). This results in

svn: Valid UTF-8 data
(hex: 65 69 6e 66)
followed by invalid UTF-8 sequence
(hex: fc 67 65 6e)

dependent on the current locale.

Jens

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-09-17 16:05:55 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.