Malcolm Rowe wrote:
> Something I thought people might be interested in: the proceedings
> of the 2006 Ottawa Linux Symposium includes an interesting, if short,
> paper titled "Towards a Better SCM: Revlog and Mercurial".
> It describes in brief how Mercurial handles revision storage through its
> revlog structure and delta storage. I know that dberlin has been looking
> at using something like the former to reduce the number of files we open
> during FSFS read operations; and as Mercurial uses an algorithm called
> 'bdiff' (based on Python's difflib algorithm) rather than xdelta,
> it might also be interesting to explore what kind of performance/storage
> results we'd get by doing the same.
So, i've explored about a billion different delta algorithms, including
what mercurial does.
They actually break data (including binary data) into lines, bucket it,
and then just match things against the buckets.
In other words, it's not significantly different than xdelta, except
that it uses a window size consisting of "one line" instead of "64 bytes".
For text data, i'm sure it does great. For binary data, i have a hard
time seeing how it could do all that great. They zlib compress the
result, so most of their binary compression probably comes from that.
I've got an improved delta algorithm based on rabin fingerprinting that
is roughly 10% faster than xdelta most of the time, and up to 50-60%
faster on larger (>a couple meg) files. The delta sizes are either the
same, slightly smaller, or slightly larger, depending :)
I also think we should up our default window size, but due to some
silliness in how this is currently implemented, changing the window size
is backwards incompatible!
(This is because the code currently relies on knowing whether there are
more windows waiting to consume by comparing the window length against
the default window size, instead of by seeing if there is an EOF :( )
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Wed Jul 26 04:06:31 2006