Greg Stein <email@example.com> writes:
> I believe the sorting of individual revisions into groups of commits will be
> the slowest part. I'm sure they've optimized GNU sort quite a bit, but I've
> got to believe it will shudder when fed a file hundreds of megabytes in
I've used GNU sort on large datasets, and I think it would handle this
problem quite happily. After all, sorting datasets much larger than
memory is a very, very old problem.
When fed enormous amounts of data, GNU sort breaks the data set into
the largest chunks it can handle in memory. It sorts those chunks
individually, and writes each one out to a file in a temp directory
(which you can specify on the command line). Then it merges those
But these decisions should be left to the person doing the work.
("Paint it blue!")
Received on Sat Oct 21 14:36:28 2006