[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Filenames with trailing newlines wreak havoc

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Wed, 27 Mar 2013 16:12:38 +0000 (GMT)

Stefan Sperling wrote:

> C. Michael Pilato wrote:
>> On 03/27/2013 11:03 AM, Stefan Sperling wrote:
>> > That's fine. The fix I've committed and proposed for backport applies
>> > the large hammer and blocks any control characters. If there's a case
>> > to be made for relaxing this check I'm happy to do that.
>>
>> Sorry, but I'm -1 on that change.  How did this expand from "trailing
>> newlines" to "any control character"?
>
> An IRC discussion with Ben:
> http://colabti.org/irclogger/irclogger_log/svn-dev?date=2013-03-26#l479
> Who said:
>   "I'm not inclined to spend a bunch of time thinking through all the
>   implications of the control characters that we already don't allow
>   in the client lib."
>
> I agree there. I also didn't want to test all control characters
> to see if FSFS can handle them. It seems safer to reject them outright.
>
>> Look, the FS allows things that the repos layer does not.  For example, it
>> doesn't care about property encoding or formats, where the repository layer
>> does.  *That's by design.*
>
> Sure, but then we need an implementation that conforms to that design.
> We clearly don't have one, else it wouldn't choke on newlines.
> Fixing the implementation is going to be hard and we aren't even
> sure if any user of the API really cares.
>
>> It's perfectly okay for us to tell folks that if
>> they modify their FS with something other than Subversion and end up with
>> data that Subversion doesn't like, that's their own problem.
>
> But (at least) with newlines, they can end up with data that the filesystem
> implementation itself doesn't like. So the API becomes entirely useless
> to them if they want to store filenames which contain newlines.
>
> What about:
>
> - We make the FS layer check for newlines (because we know FSFS can't cope)
>
> - We make the repos layer check for control characters (because the
>   Subversion clients does, too, and we had problems with third-party
>   stuff like git-svn committing stuff which svn clients couldn't deal
>   with in the past)
>
> That's some additional work, but I'm willing to do it.

My thoughts:

The key issue is we need to define 'valid path' for the FS layer, and follow through with documenting it, implementing run-time checks, and testing it.  If the domain of valid FS paths is greater than what Subversion's higher layers use, then
we need unit testing at the FS layer to ensure we're
supporting what we claim.  If however, we decide to use the same
definition as in the layer above, then we don't need to do so much
 unit testing of this layer, because testing through the layer above
should mostly suffice.  So there is a compromise between the theoretical
 independence which allows FS to have a different definition of valid
filenames, and the practical issue that if we actually do make it
different from the repos layer's definition then it's more work to
maintain.

If we now decide that LF isn't supported, I will
ask, what about CR?  That seems the next-most-likely to cause problems. 
 I can almost guarantee that CR will break the repos layer (I'm thinking
 about passing a list of paths to a hook script, on Windows), but it
might be acceptable at the FS layer.  How are we going to decide whether LF is the only control character that we need to forbid?  Partly by
analysis / thought / knowledge, and partly by testing, I hope.  The alternative is we decide that the benefit to the Subversion community of allowing other control characters in the FS layer is too small compared with the risks and effort, and so define that it shall no longer support any control characters.

As for "trailing newline", that's one specific problem that occurred in practice.  A test for that
case is a *regression test*.  But it's plain that a newline anywhere in a
 filename is going to cause problems in FSFS, so the new rule will (at least) forbid
newlines anywhere in a path.  Along with implementing a check for the new behaviour, we should be adding a new *unit
test* to verify the new behaviour, whatever that behaviour shall be.  Since that unit test will clearly
cover the "trailing newline" case, we don't really need the regression
test as well.  We would do well to take more care in distinguishing unit
 tests from regression tests.

Something like the proposal above, to reject LF only at the FS (or FSFS)
 layer, and all control characters in libsvn_repos, sounds good to me. 
We should write unit tests to ensure that the FS layer works
properly with all other control characters.

- Julian
Received on 2013-03-27 17:13:12 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.