I have some more performance data on diffs, mostly so I know what I
should try to optimize first. The "elc" data set is the .elc files
present in any subdirectory of "lisp" in both emacs 19.34b and emacs
20.7. The "src" data set is the files directly present in the "src"
subdirectory of both emacs 19.34b and emacs 20.7. All data was
collected by doing comparisons file-by-file and totalling the results.
elc src
------- -------
Number of files 397 169
Number of windows 730 556
Number of instructions 663747 282854
Total length of new data 602542 155800
Total encoded bytes 2630361 1045437
diff -ae | gzip length 2246212 665046
Total source bytes 6912321 6259304
Total target bytes 8325651 7423620
Average instruction length 12.54 26.25
Average encoded bytes per instr 3.04 3.12
The average instruction length is the average "length" field of an
instruction opcode, easily computed by dividing the total target bytes
by the number of instructions. The average encoded instruction length
comes from the formula:
(encodedbytes - newdata - 4*files - 10*windows) / instructions
where the parenthesized expression counts the number of encoded bytes
which are encoding instructions. 4*files accounts for the four-byte
svndiff headers and 10*windows roughly accounts for the size of the
window headers.
My conclusions:
* Instructions and other overhead account for 77% of the diff
for binary files and 85% of the diff for source code. Any
savings in the instruction encoding would have a noticeable
impact on the size of the diffs.
* It's going to be hard to save a lot on the instruction
encoding. The optimizations described in the Hunt paper for
the vdelta encoding might, if we're really lucky, get us
down to an average of ~2 bytes per instruction. That would
be enough to beat diff | gzip in the binary case, but not in
the source code case.
* An average block move length of 12-26 bytes doesn't seem
very impressive, and I wonder how it compares to the vdelta
algorithm as implemented by AT&T. (Everyone who writes
papers on this stuff seems to have access to a vdelta
program and library Phong Vo and company wrote back in 1994,
which I really wish I could get my hands on.)
Received on Sat Oct 21 14:36:10 2006