[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Size of revs file when deleting lines in a big text file - Bug?

From: Rob Hubbard <Rob.Hubbard_at_celoxica.com>
Date: 2006-12-07 10:34:36 CET

Hello Martin,

The size of a delta is not always relative to the *immediately* previous revision.

In order for the implementation to be able to calculate quickly (O(log n) rather than O(n)) the difference between a pair of revisions, the revisions are formed into a kind of binary tree.

That is definitely neither a bug nor a design problem.
It probably explains the variable revision sizes you're seeing.

See <http://svn.collab.net/repos/svn/trunk/notes/skip-deltas> for an explanation.

Rob.

> -----Original Message-----
> From: Martin Scharrer [mailto:mailinglists@madmarty.de]
> Sent: 07 December 2006 08:52
> To: users@subversion.tigris.org
> Subject: Size of revs file when deleting lines in a big text
> file - Bug?
>
>
> Hi folks,
>
> I'm not sure if this is a bug or just a design issue, so I
> posting it here
> before fileing a bug report.
>
> I detected the following using svn 1.4.2 (r22196) with FSFS
> under Linux:
> A text file, mbox with mails, with about 5 MB is (already) in
> subversion.
> I deleted now one email located in the first quarter of the
> file. The diff is
> about 16kBytes. After checking in this and other small
> changes a detected
> that the file in the 'db/revs' dir in the repository is over
> 3 MB in size.
> A 'svn diff -rN:M | wc -c' showed me only <0.5 MB.
>
> I then just made a test run with a new test repository out of
> curiosity. I
> checked the same file in, deleted the same email inside,
> added it again,
> deleted it again, ..., then deleted a email on the end, added
> it again, then
> a email at the middle, and added it again.
> The result is the following table, showing size of 'revs'
> file, rev, and
> comment. Use a fixed font the see it better:
>
> 115 0 Created repository
> 3279173 1 Added mbox with size 5065650 byte (gzipped
> 3199323), 76239 lines
> 474346 2 Removed one email at '@@ -5114,236 +5114,6 @@'
> (wc says: diff has
> 241 lines, 15628 bytes)
> 1332 3 Reverted mbox by copy (not 'svn cp'!) original
> file (same as in
> rev1).
> 474335 4 Repeated removal done in r2.
> 1332 5 Repeated reverting done in r3.
> 474335 6 Repeated removal done in r2.
> 1328 7 Repeated reverting done in r3.
> 1707911 8 Deleted first email in mbox '@@ -1,768 +1,3 @@'
> (wc says: diff has
> 773 lines, 55348 bytes)
> 1334 9 Reverted last deletion by coping original copy.
> 1329 10 Deleted last email in mbox '@@ -76144,107
> +76144,3 @@' (diff has
> 112 lines, 4450 byte)
> 1333 11 Reverted last deletion by coping original copy.
> 663932 12 Deleted part of large email at ca. half of mbox
> '@@ -38279,419
> +38279,7 @@' (diff has 425 lines, 32851 bytes)
> 1338 13 Reverted last deletion by coping original copy.
>
> This shows very good that subversion saves much more data for
> deleted lines in
> a text file (e.g. rev2) than for added ones (e.g. rev3). The
> size of the rev
> file is also dependend on the location of the deletion in the
> text file. It's
> bigger when the location is earlier in the file. Deleting the
> last lines of
> the file causes a very small revs file.
>
> This can't be right behaviour! Actually only some information
> like 'delete
> lines x till y' maybe with the redundance data must be saved,
> not MBs of
> data. I could understand it when it's a binary file but not
> with a text file.
>
> Best,
> Martin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service, on behalf of Celoxica Ltd.

This email and any files transmitted with it are confidential and
may be legally privileged. It is intended solely for the use of the
individual or entity to whom it is addressed. If you have received
this in error, please contact the sender and delete the material
immediately. Whilst this email has been swept for viruses, you
should carry out your own virus check before opening any
attachment. Celoxica Ltd accepts no liability for any loss or
damage which may be caused by software viruses or interception
or interruption of this email.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Dec 7 10:36:36 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.