Re: More diff performance notes

From: Jim Blandy <jimb_at_savonarola.red-bean.com>
Date: 2000-10-09 22:14:47 CEST

Greg Hudson <ghudson@MIT.EDU> writes:

> > Humor me --- why are we comparing ourselves with diff | gzip? Why
> > aren't we comparing ourselves with diff -ae?
>
> We are comparing ourselves to diff -ae | gzip; I'm just not always
> clear that I use those options. Not sure if that was the source of
> any confusion or not.

Sorry, I meant to say "diff -ae | gzip" vs. "diff -ae". It's `gzip'
that's the issue.

> It would be somewhat unfair to compare ourselves to diff -ae without
> gzip because diff doesn't compress its output and we do. And because
> it would be about as easy to "just use diff and gzip" as it would to
> "just use diff", if we can't get better performance using more
> advanced algorithms.

Well, the original goal, back in the spring, was simply to get solid
performance on arbitrary binary files, without too much code
complexity. It seemed to me that `copy/insert' algorithms were better
suited than `substring replace' algorithms for deltas that don't need
to be human readable. Xdelta was my first choice, but the
implementation seemed unmaintainable; vcdiff was simply the next thing
I found.

But the requirements were, simply:
- try out copy/insert
- solid performance on binary files
- easy to implement

So, as far as I'm concerned, your svndiff stuff is ready to go.

That said, I certainly *don't* want to discourage you from improving
your encoding. If we can match diff + gzip without spending any
cycles on gzip, that's awesome.
Received on Sat Oct 21 14:36:10 2006

This message: [ Message body ]
Next message: Jim Blandy: "APR is inconsistent about pool behavior"
Previous message: Joe Orton: "[PATCH] fix string realloc pool mismatches"
Maybe in reply to: Greg Hudson: "More diff performance notes"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]