> > Greg Hudson <ghudson@mit.edu> writes:
> >
> >> Well, I have a little bit of data on differencing performance using my
> >> proposed output format. I compared the gcc 2.95 and gcc 2.95.2
> >> sources against each other (just the .c files in the gcc subdir), and
> >> found:
> >>
> >> Bytes
> >> File-by-file, ours: 15828
> >> File-by-file, diff+gzip: 9989
> >> Concatenated, ours: 15206
> >> Concatenated, diff+gzip: 6015
Is this a fair comparison? We should be comparing ourselves with
diff, not diff+gzip. gzip is an aggressive compression program; we're
not at all trying to compete with that. In order to match gzip, we'd
have to implement the specialized coding tables described in the
VCDIFF document, and use secondary encoders for the instructions and
the data. gzip does the two-level encoding, at least.
Received on Sat Oct 21 14:36:10 2006