On Wed, Jul 12, 2017 at 12:27 PM, Branko Čibej <brane_at_apache.org> wrote:
> On 12.07.2017 12:24, Johan Corveleyn wrote:
>> On Wed, Jul 12, 2017 at 12:09 PM, Branko Čibej <brane_at_apache.org> wrote:
>>> On 11.07.2017 22:50, Stefan Sperling wrote:
>>>> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
>>>>> Another issue I have with the proposal is the idea to use file suffixes.
>>>>> That's usually the wrong way to go about things (case in point: Windows
>>>>> does it, with didastrous results). It's much better to determine file
>>>>> format by inspection, such as, e.g., libmagic does. We already have
>>>>> optional support for libmagic in the client (to set svn:mime-type).
>>>> I would not feel comfortable having the server parse arbitrary data with
>>>> libmagic. The libmagic code is not very safe to run on untrusted input.
>>>> I have seen libmagic crash my svn client on several occasions even on
>>>> text files I wrote.
>>>>
>>>> At the client side it's a bit less dangerous because users have already
>>>> told svn to add the files in question to version control, and a libmagic
>>>> exploit running on the client machine can do less harm than a server-side one.
>>>>
>>>> Granted, commits are usually authenticated. If we did this we should at
>>>> least make really sure that no unauthenticated access can trigger this code.
>>>> Ideally, it would be sandboxed somehow if we started using it on the server.
>>> I wasn't really proposing to use libmagic on the server. My point is
>>> that instead of using file name suffixes (which the compression and
>>> deltification code don't know about), we'd do some sort of inspection
>>> instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
>>> is fairly easy just from looking at a few bytes of headers. Same goes
>>> for most image and video formats.
>>>
>>> Of course, one could always concoct a file that would trick such
>>> inspection, but at least that's marginally harder to do than commit a
>>> large text file full of spaces and calling it 'spaceinvaders.jpg'. :)
>>>
>>> Random binary data is harder to detect, but we already deal with that
>>> after the fact by using the plain text if the delta is too large.
>> We could also make the process driven by a "client-side suggestion".
>> Driving it from the client-side also gives us the possibility to
>> eliminate the client-side deltification overhead.
>>
>> I.e. the client has some logic (libmagic, suffix, looking at the first
>> 100 bytes, ...) to determine that it's not worth deltifying and/or
>> compressing. It doesn't do deltification itself, and lets the server
>> know that it probably shouldn't either (or the server sees that the
>> client hasn't deltified, so accepts the content automatically as
>> "non-deltifiable / non-compressable"?).
>>
>> Maybe needs a server-side config setting to make it respect or ignore
>> the client-side suggestion.
>
> That's such an easy way to make a malicious client explode the
> repository size. And ... there's realy no reason to complicate. The
> server's storage layer can cheaply do all the necessary checks without
> having to believe the client, and without adding yet another
> (dangerous!) config knob.
Yes, well in any case allowing this by server-side inspection will
also open up possibilities for blowing up the repository by a
malicious client.
In fact, making it coupled with "client also non-deltifies" forces the
client to also send those huge files over the wire, making it a little
bit more difficult to DoS the server by blowing it up. If the client
can still deltify (only sending a few bytes), but trick the server
into storing those as full-texts, the attack can be more powerful I
guess.
--
Johan
Received on 2017-07-12 12:44:32 CEST