On Thu, Feb 28, 2013 at 8:37 AM, Ben Reser <ben_at_reser.org> wrote:
> I just don't see this happening unless someone has a very clever idea
> that I haven't thought of.
Speaking with Julian here at ApacheCon he mentioned that gzip has a
rsyncable option. Looking into this turns out that there is a patch
applied to Debian's gzip that provides this option. It resets the
compression algorithm every 1000 bytes and thus makes blocks that can
be saved between revisions of the file. gzip uses the same DEFLATE
algorithm that most zip files use, so the same idea could be applied
to it. If we want to deal with something like this in Subversion, I
think we'd deal with it via some sort of plugin for specific file
types that could convert to the more efficient to deltify encoding
before committing. Unfortunately, we don't have any sort of plugin
type infrastructure for this today.
Even still there are things that can be done today. I made a couple
trivial Microsoft Office Word documents. One with the characters
"abc" in them and one with "abcdef" in it. I saved the files in .docx
and in the 2003 flat XML format. The .docx file produced a delta of
3262 bytes, the .xml format produced a file with a delta of just 358
bytes.
OpenOffice/LibreOffice support flat versions of their format (e.g.
.fodt) that are not compressed and can also be more efficiently stored
in Subversion. LibreOffice even has a wiki about this:
https://wiki.documentfoundation.org/Libreoffice_and_subversion
Received on 2013-02-28 19:58:45 CET