[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS format7 and compressed XML bundles

From: Vincent Lefevre <vincent-svn_at_vinc17.net>
Date: Wed, 6 Mar 2013 11:41:45 +0100

On 2013-03-05 16:52:30 +0000, Julian Foad wrote:
> Vincent Lefevre wrote:
[about server-side vs client-side]
> > But even if there would be no problems with the
> > construction/reconstruction, it would be a bad solution, IMHO.
> > Indeed, for a commit, it is the client that is supposed to expand
> > the data before sending the diff to the server,
>
> What do you mean "the client [...] is supposed to expand the data"? 
> I don't understand why you think the client is "supposed" to do such
> a thing.

Because the diff between two huge compressed files is generally huge
(unless some rsync-friendly option has been applied, when available).
So, if the client doesn't uncompress the data for the server, it will
have to send a huge diff or a huge compressed file, even though the
diff between the uncompressed data may be small. So, if
deconstruction/reconstruction is possible (canonical form),
it is much more efficient to do this on the client side.

> >> That point _is_ specific to a server-side solution.  With a
> >> client-side solution, the user's word processor may not mind if a
> >> versioning operation such as a commit (through a decompressing
> >> plug-in) followed by checkout (through a re-compressing plug-in)
> >> changes the bit pattern of the compressed file, so long as the
> >> uncompressed content that it represents is unchanged.
> >
> > I disagree.
>
> It's not clear what you disagree with.

With the second sentence ("... may not mind ..."), thus with the first
sentence too.

> > The word processor may not mind (in theory, because
> > in practice, one may have bugs that depend on the bit pattern,
> > and it would be bad to expose the user to such kind of bugs and
> > non-deterministic behavior), but for the user this may be important.
> > For instance, a different bit pattern will break a possible signature
> > on the compressed file.
>
> I agree that it *may* be important for the user, but the users have
> control so they can use this client-side scheme in scenarios where
> it works for them and not use it in other scenarios.

But one should need a scheme that will also work in the case where
users care about the bit pattern of the compressed file.

Moreover even when the users know that the exact bit pattern of the
compressed file is not important at some time, this may no longer
be true in the future. For instance, some current word processor may
ignore the dates in zip files, but future ones may take them into
account. So, you need to wonder what data are important in a zip
file, including undocumented ones used by some implementations (as
the zip format allows extensions). Taking them into account when it
appears that these data become meaningful is too late, because such
data would have already been lost in past versions of the Subversion
repository.

On 2013-03-05 17:10:02 +0000, Julian Foad wrote:
> I (Julian Foad) wrote:
> > Vincent Lefevre wrote:
> >>  On 2013-03-05 13:30:28 +0000, Julian Foad wrote:
> >>> Vincent Lefevre wrote:
> >>>> On 2013-03-01 14:58:10 +0000, Philip Martin wrote:
> >>>>> A server-side solution is difficult.  Suppose the client has some
> >>>>> uncompressed content U which it compresses to C and sends to the server.
> >>>>> The server can uncompress C to get U but unless the compression scheme
> >>>>> has a canonical compressed form, with no other forms allowed, the server
> >>>>> cannot avoid storing C because there is no guarantee that C can be
> >>>>> reconstructed from U.
> >>>>
> >>>> This is not specific to server side. Even on the client side, the
> >>>> reconstruction may not be always possible, e.g. if the system is
> >>>> upgraded or if NFS is used. And the compression level may need to
> >>>> be detected or provided in some way.
> >>>
> >>> Hi Vincent.  I'm not sure you understood Philip's point.
> >>
> >> This should be more clear about what I meant below. What I'm saying is
> >> that whether this is done entirely on the server side (a bad solution,
> >> IMHO) or on the client side (see below why), the problems are similar.
> >
> > The point Philip made is *not* a problem if done client-side;
>
> Let me take that back.  The point that I interpreted as being the
> most significant impact of what Philip said, namely that the
> Subversion protocols and system design require reproducible content,
> is only a problem when done server-side.  Other impacts of that same
> point, such as you mentioned, are applicable no matter whether
> server-side or client-side.

The Subversion protocols and system design *currently* require
reproducible content, but if new features are added, e.g. due to the
fact that the users don't mind about the exact compressed content of
some file, then it could be decided to change the protocols and the
requirements (the server could consider some canonical uncompressed
form as a reference).

[...]
> So my main point is that the server-side expand/compress is a
> non-starter of an idea, because it violates basic Subversion
> requirements, whereas client-side is a viable option for some use
> cases.

I would reject the server-side expand/compress, not because of the
current requirements (which could be changed to more or less match
what happens on the client side), but because of performance reasons
(see my first paragraph of this message).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Received on 2013-03-06 11:42:29 CET

This is an archived mail posted to the Subversion Dev mailing list.