> What if svndiff format recorded window sizes somewhere (either in a
> single header, or at the front of each window of data)
It does, of course, at the front of each window of data. Source view
offset, source view length, target view size.
> and Subversion chose window sizes dynamically, based on the source &
> target file sizes? After all, Subversion usually or always knows
> those sizes in advance.
The question is not the size of the files but the amount of data
movement in the files. If a single line of 60 characters is inserted
at the front of a large file, then a small window will do; however, if
a medium-sized file is reversed line by line, then you'd need windows
almost as large as the file itself to get much advantage from having
the source data available.
There's no compelling reason to use a smaller window for a smaller
file if you have the memory to spare. I know I said "the optimal
window size depends on the input data," but what I really meant is
"the sweet spot for delta size performance depends on." You'll always
do okay using larger windows than is necessary to reach the sweet
The real question is what is a reasonable amount of memory for
subversion programs to use. That depends more on the machine the
subversion code is running on (and the machine the code is talking to,
when you're transmitting diffs across the wire) than the input data.
I think 100K windows is enough to handle quite a bit of data movement
while still being relatively innocuous on most machines.
Received on Sat Oct 21 14:36:11 2006