Size of revs file when deleting lines in a big text file - Bug?

From: Martin Scharrer <mailinglists_at_madmarty.de>
Date: 2006-12-07 09:51:33 CET

Hi folks,

I'm not sure if this is a bug or just a design issue, so I posting it here
before fileing a bug report.

I detected the following using svn 1.4.2 (r22196) with FSFS under Linux:
A text file, mbox with mails, with about 5 MB is (already) in subversion.
I deleted now one email located in the first quarter of the file. The diff is
about 16kBytes. After checking in this and other small changes a detected
that the file in the 'db/revs' dir in the repository is over 3 MB in size.
A 'svn diff -rN:M | wc -c' showed me only <0.5 MB.

I then just made a test run with a new test repository out of curiosity. I
checked the same file in, deleted the same email inside, added it again,
deleted it again, ..., then deleted a email on the end, added it again, then
a email at the middle, and added it again.
The result is the following table, showing size of 'revs' file, rev, and
comment. Use a fixed font the see it better:

    115 0 Created repository
3279173 1 Added mbox with size 5065650 byte (gzipped 3199323), 76239 lines
 474346 2 Removed one email at '@@ -5114,236 +5114,6 @@' (wc says: diff has
241 lines, 15628 bytes)
   1332 3 Reverted mbox by copy (not 'svn cp'!) original file (same as in
 474335 4 Repeated removal done in r2.
   1332 5 Repeated reverting done in r3.
 474335 6 Repeated removal done in r2.
   1328 7 Repeated reverting done in r3.
1707911 8 Deleted first email in mbox '@@ -1,768 +1,3 @@' (wc says: diff has
773 lines, 55348 bytes)
   1334 9 Reverted last deletion by coping original copy.
   1329 10 Deleted last email in mbox '@@ -76144,107 +76144,3 @@' (diff has
112 lines, 4450 byte)
   1333 11 Reverted last deletion by coping original copy.
 663932 12 Deleted part of large email at ca. half of mbox '@@ -38279,419
+38279,7 @@' (diff has 425 lines, 32851 bytes)
   1338 13 Reverted last deletion by coping original copy.

This shows very good that subversion saves much more data for deleted lines in
a text file (e.g. rev2) than for added ones (e.g. rev3). The size of the rev
file is also dependend on the location of the deletion in the text file. It's
bigger when the location is earlier in the file. Deleting the last lines of
the file causes a very small revs file.

This can't be right behaviour! Actually only some information like 'delete
lines x till y' maybe with the redundance data must be saved, not MBs of
data. I could understand it when it's a binary file but not with a text file.


Received on Thu Dec 7 09:52:22 2006

This is an archived mail posted to the Subversion Users mailing list.