[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

URI-encoding on 1.7 repository?

From: Garret Wilson <garret_at_globalmentor.com>
Date: Fri, 20 Jan 2012 10:38:28 -0800

What is the canonical way to encode filenames, both in the API and in
the underlying FSFS in a Subversion 1.7 repository?

Let's say I have the file "a b.txt", which consists of "a" and "b" with
a space in between. How should this be stored on the server? How should
the various APIs give it to me?

Let me explain further. If I commit a file on Windows 7 Professional 64
bit on an NTFS partition using TortoiseSVN, and then turn around and
read that repository using SVNKit, the SVNDirEntry.getRelativePath()
gives me "a b.txt". I don't know if on the back-end these files are
being stored as "a b.txt", or if they are being stored in canonical URI
form (i.e. "a%20b.txt") and SVNKit is just being "helpful" by decoding them.

 From my end I'm actually starting with 100% canonically-encoded URIs to
begin with. If Subversion is storing these things in decoded form on the
back end, does it compensate for characters not supported by the
underlying file system? So when I take my URI and I decode it just so I
can save the filename the way Subversion likes, how do I know which
characters to decode (those supported by the underlying file system---as
if I, the client know what that is!) but which characters to leave
encoded (those not supported by the underlying file system on the server)?

Maybe someone can set me straight here. I'm hoping that Subversion
stores everything in correctly UTF-8 encoded and escaped URIs in the
back-end and in its APIs, and that the real culprit here is SVNKit for
being "helpful" and decoding the strings for me without asking. Or I
suppose the other option that would work almost as well is if everything
on the back-end was stored in decoded form, but some tricks are pulled
so that /all/ characters are supported, regardless of the underlying
file system. The case I don't want to end up in is where I have to
encode some characters but not others based upon some file system
implementation I don't know about on the server.

Thanks for shedding some light on this.

Garret
Received on 2012-01-20 19:39:18 CET

This is an archived mail posted to the Subversion Dev mailing list.