Jack Repenning wrote:
> There might be other tunable parameters as well. For example, based
> solely on experiments like trying to ziip and already-zipped file, I
> suspect that deltification of certain file types is both unusually
> expensive and unusually unproductive (zip files, for example).
> Encrypted files are even scarier, since it's an explicit goal of
> crypto that compressing the exact same file twice must produce a
> completely dissimilar ciphertext. I posit a class of files or which
> deltification is unoptimal (perhaps actually deleterious). Who are
> the deltification gurus on the list? Has this question been considered?
Yes, this question has been considered (at least by me), but up to now
there have been no satisfactory solutions. At first guess, we should
avoid deltifying files that are compressed and/or encrypted (this would
include all sorts of image formats, for example). We should also have
the option of just compressing ("deltify against empty source") files in
formats that are known to behave badly under deltification.
Basically, we should have three storage methods:
* store: just store the new data in the repository
* compress: compress the new data (vdelta or zlib -- whichever is
"better", for some definition of better)
* deltify: what we're doing now.
What to do about a particular file should be based on its type (MIME or
otherwise), and per-repository configuration. Obviously, this means that
automatic svn:mime-type detection is a must if we want this to be
efficient. Happily, our FS schema wouldn't have to change to support
different storage methods (except for the introduction of new values for
the representation types), and these changes would be completely
transparent to the client.
Of course, a particular file's storage class might change during its
lifetime, and later on we may want to make it configurable on the
client, or at least propagate the storage info to the client (in
Two other issues we should consider:
1) Storage class (as opposed to storage method)
In some cases, users may prefer to store file contents in ordinary files
rather than in the repository. Hierarchic storage management comes to
mind, for example; it's very hard to store your 50-gig video files
off-line if they're ensconced the repository...
2) On-the-wire representation
If a file is just "store"d in the repository because it doesn't compress
well, it's a good guess that sending deltas over the wire (or piping
through mod_deflate) isn't the most efficient thing to do. The possible
transmission methods are the same as the storage methods, except that
the choice of representation depends not just on the way it's stored in
the repository, but also on the repository-access layer, link speed,
etc. I can imagine httpd configuration that guesses link speed based on
client IP, for example.
Anyway, I don't think any of the above has to be implemented before 1.0.
Brane Čibej <brane_at_xbc.nu> http://www.xbc.nu/brane/
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Wed Nov 5 03:57:56 2003