[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: More diff performance notes

From: Jim Blandy <jimb_at_savonarola.red-bean.com>
Date: 2000-10-09 22:14:47 CEST

Greg Hudson <ghudson@MIT.EDU> writes:

> > Humor me --- why are we comparing ourselves with diff | gzip? Why
> > aren't we comparing ourselves with diff -ae?
>
> We are comparing ourselves to diff -ae | gzip; I'm just not always
> clear that I use those options. Not sure if that was the source of
> any confusion or not.

Sorry, I meant to say "diff -ae | gzip" vs. "diff -ae". It's `gzip'
that's the issue.

> It would be somewhat unfair to compare ourselves to diff -ae without
> gzip because diff doesn't compress its output and we do. And because
> it would be about as easy to "just use diff and gzip" as it would to
> "just use diff", if we can't get better performance using more
> advanced algorithms.

Well, the original goal, back in the spring, was simply to get solid
performance on arbitrary binary files, without too much code
complexity. It seemed to me that `copy/insert' algorithms were better
suited than `substring replace' algorithms for deltas that don't need
to be human readable. Xdelta was my first choice, but the
implementation seemed unmaintainable; vcdiff was simply the next thing
I found.

But the requirements were, simply:
- try out copy/insert
- solid performance on binary files
- easy to implement

So, as far as I'm concerned, your svndiff stuff is ready to go.

That said, I certainly *don't* want to discourage you from improving
your encoding. If we can match diff + gzip without spending any
cycles on gzip, that's awesome.
Received on Sat Oct 21 14:36:10 2006

This is an archived mail posted to the Subversion Dev mailing list.