Explain character set restrictions for text and path names
(docs/book/book/ch03.xml): Expand sidebar to discuss character encoding,
and hence restrictins on legal characters, for text and path names.
Index: book/ch03.xml
===================================================================
--- book/ch03.xml (revision 13123)
+++ book/ch03.xml (working copy)
@@ -312,13 +312,55 @@
- Repository Layout
+ What's in a name?
+
+ Subversion tries hard not to limit the type of data you
+ can place under version control. The contents of files and
+ property values are stored and transmitted as binary data, and
+ the tells you how
+ to give Subversion a hint that textual
operations
+ don't make sense for a particular file. There are a few places,
+ however, where Subversion places restrictions on information it
+ stores.
- If you're wondering what trunk is all
- about in the above URL, it's part of the way we recommend
- you lay out your Subversion repository which we'll talk a lot
- more about in .
+ Subversion handles text internally as UTF-8 encoded
+ Unicode. As a result, certain items which are
+ inherently textual
, such as property names, path
+ names, and log messages, can only contain legal UTF-8
+ characters. It also provides a minimum requirement for use of the
+ svn:mime-type property: if a file's contents
+ aren't compatible with UTF-8, you should mark it as a binary
+ file. Otherwise, Subversion will attempt to merge differences
+ using UTF-8, which is likely to leave garbage in the
+ file.
+ In addition, path names are used as XML attribute values
+ in WebDAV exchanges, as well in as some of Subversion's
+ housekeeping files. This means that path names can only contain
+ legal XML (1.0) characters. Subversion also prohibits
+ TAB, CR, and LF in path names, so they aren't broken up
+ in diffs, or in the output of commands like
+ or
+ .
+
+ While it may seem like a lot to remember, in practice
+ these limitations are rarely a problem. As long as your
+ locale settings are compatible with UTF-8, and you don't use
+ control characters in path names, you should have no trouble
+ communicating with Subversion. The command line client adds an
+ extra bit of help: it will automatically escape legal
+ characters as needed in URLs you type to create legally
+ correct
versions for internal use.
+
+ Experienced users of Subversion have also developed a set
+ of best practice
conventions for laying out paths
+ in the repository. While these aren't strict requirements like
+ the syntax described above, they help to organize frequently
+ performed tasks. The /trunk part of the URL
+ above is one of these conventions; we'll talk a lot more about
+ it and related recommendations in .
+
Although the above example checks out the trunk directory,