[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Re: Proposal: generic svn:encoding mechanism

From: André Pönitz <andre_at_wasy.de>
Date: 2005-08-04 08:09:59 CEST

> As I have never used MacOS or heard of these elusive "resource forks",
> could you elaborate? Besides, the way you're saying is that it should be
> encoded on the versionned filesystem, and decoded when checked out?

See my other mail from today. There should be three stages: A form used
for 'final archiving' (i.e. whatever ends up read-only somewhere), a form
used for creating diffs (is this the 'deltification') and a form presented
to the user in his working copy.

So far the assumption is 'If everybody used only plain text everything
would be fine' and while this may be true for plain code it is not
necessarily true for almost everything else.

Let me try to describe a simple case. I want to store a background pixmap
for an application. Let's say it is 200 kB as .png, 5 MB as .xpm and
300 kB as .gif (numbers made up). Lets further assume my app can only
display .gif and I am short of disk space on the server (or backup tape).

Now suppose, I change the color 'grey100' to 'grey99' in the pixmap.

This is a single line change in the .xpm, so the whole thing could
ideally be stored as 200 kB .png + 0.5 kB diff for the .xpm version,

Using the textform only I get 5000 MB .xpm + 0.5 kB diff + the problem
that said app can't display .xpm. Moreover, thats wasting 25 times
the space that would actually be needed.

Using binary .gif only gives a 300 kB .gif + 300 kB for the 'diff',
still a factor of 3 over the ideal solution.

And now suppose we a talking not about a single pixmap but of thousands
and files that regularily exceed 100 MB.

And note, btw, that the plain textform is by far less attractive than
the binary (all .gif) version.

> How does this square with deltification? If you store stuff in a binary form
> in the versionned fs, you lose all possibility of saving space.

Only if deltification works on text only.

> What's the point? Don't you mean encoding on the fly as it is checked out,
> to have the encoded file presented to the user? Yes, reading through the
> rest of the mail, looks like you meant that.

And he's obviously not alone with that wish.

> > Application #2: svn:encoding=gzip
> >
> > This would invoke a "gzip" encoder prior to commit, "gzip" decoder
> > on checkout. Result is a gzip-compressed file in the repository,
> > expanded file on disk.
> Again, I fail to see a real gain in doing this. You mean here expanding
> the file to have the benefits of deltification, but compress it on
> checkout automatically? Why? In fact, why are you storing compressed
> files in your repository? Why not just store the uncompressed file and
> be done with it?

See example above. The uncompressed data may be prohibitively large
and/or not suitable for presentation.
> > This mechanism would also provide a hook for other types of extended
> > attribute systems on other platforms in the future.
> To be perfectly honest and, well, blunt, this sounds a lot like trying
> to find justifications for adding some form of resource fork support in
> svn with no other practical use. But I'm open to proof of the contrary :-).

If you don't mind, please comment my example above and try to explain
why the pure textform (all .xpm) would be preferable over the pure binary
form (all .gif) and over the 'ideal' form (.png <-> .xpm <-> .gif).

> > There are a few questions to resolve with this: Does the mime-type
> > apply to the file on disk or the file in the repo? How do you deal with
> > unsupported encodings? These seem pretty mundane issues to me, with a
> > variety of reasonable resolutions.
> The mime-type issue would probably require some work, because both mime
> types would be useful in this case. As for unsupported encodings, the
> answer depends on another question you seem to have answered, but forgot
> to tell us about: Where is the conversion performed? Client-side or
> server-side?

In the example above .png <-> .xpm could be on the server,
.xpm <-> .gif is probably on the client.

> If on the server side, you have the problem of needing to install many
> other programs on the server, that might not even be available (binhex
> manipulation tools might not be available on the platform the server is
> operating off). If on the client side, you run into the same problems

For sure. But that's a kind of problem people would be happy to solve
in their small world if they could get the benefits.

> that you do with client-side scripting: how can you assume that all
> clients will have the required tools to perform encoding conversion?

By telling the people working on the project about this dependencies.

This situation is not different at all from what is used today.
Take your favourite autotool based project. You need a certain version
of autotools to get started.

Or projects using external libraries.

Or even compilers. Only gcc 2.6.1? Bad luck. Upgrade and retry.

> If they don't, what do you do?

Tell the user and abort.

> Each user therefore has a working copy that is different for the same
> repository version, depending on available third-party programs?


Of course this opens a can of worms. But the problems can be solved
by the _users_ (i.e. svn users and repo administrators).

> Another question you elude: you speak of 'gzip encoder' and 'gzip
> decoder'. How do you identify these programs on the system? Do you
> assume that gzip means /usr/bin/gzip ? Do you provide extra properties
> for specifying the actual binaries? How do you solve diverging path
> issues (one client stores it in /usr/bin/gzip, another in
> /usr/local/bin/gzip, another calls it /opt/bin/gzip, another has
> /home/foo/bin/gnu-zip which is gzip compatible, etc.) ?

/etc/mailcap wouldn't be that bad for starters...

In fact, even a hard coded path would do. I have no problems of making
a local policy that there needs to be a working /usr/local/bin/foo2bar
for people working on project Baz. Not nice, but considering the gains

> Sorry for being blunt. I'm interested in the implications of what you're
> saying, but I'm trying to understand and help you flesh out your
> proposal a little, whilst at the same time throwing in my general
> feeling of "why is this actually useful?" :-).


I just wanted to make sure that it is understood as well that
the original poster is not alone with his problems, and 'text only'
is not the magical bullet as it is usually described on this list.


To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Aug 4 08:13:40 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.