On 12.07.2017 12:24, Johan Corveleyn wrote:
> On Wed, Jul 12, 2017 at 12:09 PM, Branko Čibej <brane_at_apache.org> wrote:
>> On 11.07.2017 22:50, Stefan Sperling wrote:
>>> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
>>>> Another issue I have with the proposal is the idea to use file suffixes.
>>>> That's usually the wrong way to go about things (case in point: Windows
>>>> does it, with didastrous results). It's much better to determine file
>>>> format by inspection, such as, e.g., libmagic does. We already have
>>>> optional support for libmagic in the client (to set svn:mime-type).
>>> I would not feel comfortable having the server parse arbitrary data with
>>> libmagic. The libmagic code is not very safe to run on untrusted input.
>>> I have seen libmagic crash my svn client on several occasions even on
>>> text files I wrote.
>>>
>>> At the client side it's a bit less dangerous because users have already
>>> told svn to add the files in question to version control, and a libmagic
>>> exploit running on the client machine can do less harm than a server-side one.
>>>
>>> Granted, commits are usually authenticated. If we did this we should at
>>> least make really sure that no unauthenticated access can trigger this code.
>>> Ideally, it would be sandboxed somehow if we started using it on the server.
>> I wasn't really proposing to use libmagic on the server. My point is
>> that instead of using file name suffixes (which the compression and
>> deltification code don't know about), we'd do some sort of inspection
>> instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
>> is fairly easy just from looking at a few bytes of headers. Same goes
>> for most image and video formats.
>>
>> Of course, one could always concoct a file that would trick such
>> inspection, but at least that's marginally harder to do than commit a
>> large text file full of spaces and calling it 'spaceinvaders.jpg'. :)
>>
>> Random binary data is harder to detect, but we already deal with that
>> after the fact by using the plain text if the delta is too large.
> We could also make the process driven by a "client-side suggestion".
> Driving it from the client-side also gives us the possibility to
> eliminate the client-side deltification overhead.
>
> I.e. the client has some logic (libmagic, suffix, looking at the first
> 100 bytes, ...) to determine that it's not worth deltifying and/or
> compressing. It doesn't do deltification itself, and lets the server
> know that it probably shouldn't either (or the server sees that the
> client hasn't deltified, so accepts the content automatically as
> "non-deltifiable / non-compressable"?).
>
> Maybe needs a server-side config setting to make it respect or ignore
> the client-side suggestion.
That's such an easy way to make a malicious client explode the
repository size. And ... there's realy no reason to complicate. The
server's storage layer can cheaply do all the necessary checks without
having to believe the client, and without adding yet another
(dangerous!) config knob.
-- Brane
Received on 2017-07-12 12:27:53 CEST