[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Size of revs file when deleting lines in a big text file - Bug?

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: 2006-12-07 11:52:44 CET

On 12/7/2006 4:34 AM, Rob Hubbard wrote:
> Hello Martin,
>
> The size of a delta is not always relative to the *immediately* previous revision.
>
> In order for the implementation to be able to calculate quickly (O(log n) rather than O(n)) the difference between a pair of revisions, the revisions are formed into a kind of binary tree.
>
> That is definitely neither a bug nor a design problem.
> It probably explains the variable revision sizes you're seeing.

It would explain variable revision sizes, but his seem more variable
than I'd expect. At rev 2, the diff was 15k, but the delta was 470K.
That's bigger than necessary for just the diff against rev 1, but
smaller than necessary to hold a diff against rev 0.

Is there some debug mode that can tell you exactly what is stored in a
delta?

Duncan Murdoch

>
> See <http://svn.collab.net/repos/svn/trunk/notes/skip-deltas> for an explanation.
>
> Rob.
>
>
>> -----Original Message-----
>> From: Martin Scharrer [mailto:mailinglists@madmarty.de]
>> Sent: 07 December 2006 08:52
>> To: users@subversion.tigris.org
>> Subject: Size of revs file when deleting lines in a big text
>> file - Bug?
>>
>>
>> Hi folks,
>>
>> I'm not sure if this is a bug or just a design issue, so I
>> posting it here
>> before fileing a bug report.
>>
>> I detected the following using svn 1.4.2 (r22196) with FSFS
>> under Linux:
>> A text file, mbox with mails, with about 5 MB is (already) in
>> subversion.
>> I deleted now one email located in the first quarter of the
>> file. The diff is
>> about 16kBytes. After checking in this and other small
>> changes a detected
>> that the file in the 'db/revs' dir in the repository is over
>> 3 MB in size.
>> A 'svn diff -rN:M | wc -c' showed me only <0.5 MB.
>>
>> I then just made a test run with a new test repository out of
>> curiosity. I
>> checked the same file in, deleted the same email inside,
>> added it again,
>> deleted it again, ..., then deleted a email on the end, added
>> it again, then
>> a email at the middle, and added it again.
>> The result is the following table, showing size of 'revs'
>> file, rev, and
>> comment. Use a fixed font the see it better:
>>
>> 115 0 Created repository
>> 3279173 1 Added mbox with size 5065650 byte (gzipped
>> 3199323), 76239 lines
>> 474346 2 Removed one email at '@@ -5114,236 +5114,6 @@'
>> (wc says: diff has
>> 241 lines, 15628 bytes)
>> 1332 3 Reverted mbox by copy (not 'svn cp'!) original
>> file (same as in
>> rev1).
>> 474335 4 Repeated removal done in r2.
>> 1332 5 Repeated reverting done in r3.
>> 474335 6 Repeated removal done in r2.
>> 1328 7 Repeated reverting done in r3.
>> 1707911 8 Deleted first email in mbox '@@ -1,768 +1,3 @@'
>> (wc says: diff has
>> 773 lines, 55348 bytes)
>> 1334 9 Reverted last deletion by coping original copy.
>> 1329 10 Deleted last email in mbox '@@ -76144,107
>> +76144,3 @@' (diff has
>> 112 lines, 4450 byte)
>> 1333 11 Reverted last deletion by coping original copy.
>> 663932 12 Deleted part of large email at ca. half of mbox
>> '@@ -38279,419
>> +38279,7 @@' (diff has 425 lines, 32851 bytes)
>> 1338 13 Reverted last deletion by coping original copy.
>>
>> This shows very good that subversion saves much more data for
>> deleted lines in
>> a text file (e.g. rev2) than for added ones (e.g. rev3). The
>> size of the rev
>> file is also dependend on the location of the deletion in the
>> text file. It's
>> bigger when the location is earlier in the file. Deleting the
>> last lines of
>> the file causes a very small revs file.
>>
>> This can't be right behaviour! Actually only some information
>> like 'delete
>> lines x till y' maybe with the redundance data must be saved,
>> not MBs of
>> data. I could understand it when it's a binary file but not
>> with a text file.
>>
>> Best,
>> Martin
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
>> For additional commands, e-mail: users-help@subversion.tigris.org
>
> _____________________________________________________________________
> This message has been checked for all known viruses by the MessageLabs Virus Scanning Service, on behalf of Celoxica Ltd.
>
> This email and any files transmitted with it are confidential and
> may be legally privileged. It is intended solely for the use of the
> individual or entity to whom it is addressed. If you have received
> this in error, please contact the sender and delete the material
> immediately. Whilst this email has been swept for viruses, you
> should carry out your own virus check before opening any
> attachment. Celoxica Ltd accepts no liability for any loss or
> damage which may be caused by software viruses or interception
> or interruption of this email.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Dec 7 11:53:57 2006

This is an archived mail posted to the Subversion Users mailing list.