[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS format7 and compressed XML bundles

From: Ben Reser <ben_at_reser.org>
Date: Thu, 28 Feb 2013 08:37:51 -0800

On Thu, Feb 28, 2013 at 8:04 AM, Magnus Thor Torfason
<zulutime.net_at_gmail.com> wrote:
> I've been following the discussion about FSFS format7, and had a question:
> Is there any chance that the format would improve storage efficiency for
> documents that are stored as compressed (zipped) bundles of XML files and
> other resource files (Read MS Office Documents, but OpenOffice is similar).
>
> I'm finding that making very small changes in big documents (with embedded
> images) results in rapid growth of the repository, since the binary diff
> algorithm seems to not be able to figure out efficient deltas for this type
> of documents, even though analysis of the contents shows that they are
> almost unchanged.

I don't think it's in the plan at this point. The question I have is
how would you imagine that SVN should efficiently store these files?
From the file system layer I don't think there is a good solution to
this problem. Since the only way I can see you having efficient
storage is to start manipulating the files e.g. decompressing them for
storage in the repository. Our file system layer should never start
manipulating the content it's storing.

The only solution I see to this problem and frankly I don't think it's
one we're likely to implement is a client side special handling of
certain mime-types. Similar to how we do end of line normalization
based on a property, we could decompress these files for storage in
the repo and then re-compress them at the client side.

That said let me explain why I think we'd not be likely to implement this.

1) This would require special handling of certain file formats,
something I don't think we should get into.
2) We might have the dependencies to decompress some formats, but once
we go down this road we'd likely need to pull in more and more exotic
libraries or we'd have to tell people no we won't support this one
format.
3) You'd be saving storage at the expense of using time (read: CPU) on
every client that's working with those files when checking out. So
the end result may be worse than the current problem.

I just don't see this happening unless someone has a very clever idea
that I haven't thought of.
Received on 2013-02-28 17:38:32 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.