[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Binary Efficiency

From: Branko Čibej <brane_at_xbc.nu>
Date: 2003-06-27 12:55:56 CEST

Michael Wood wrote:

>On Thu, Jun 26, 2003 at 01:12:06PM -0400, Clint Chapman wrote:
>[snip]
>
>
>>Not Exactly, I did 5 more commits on the 10MB CAD file and the db
>>directory would grow by about 25 megs every time while the strings
>>file was growing by about 3 MB every time. It makes me wonder how
>>well the binary diffing is working since the 10MB CAD file compresses
>>to 2.3 MB in zip format and I understand the repository is bzipped.
>>
>>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>[snip]
>
>I assume you mean that Subversion compresses the repository using the
>same algorithm as bzip2?
>
>As far as I understand it, some nodes in the repository are stored as
>deltas against other nodes, while some are stored undeltified. This is
>to balance size against speed. There might be some compression other
>than the deltification, but it's not bzip2 AFAIK.
>
>

The algorithm we use to create binary deltas is called vdelta. It's a
block-copy delta algorithm (i.e., the output does not contain context
information) and it's also a complression algorithm, although the
compression is not as good as the one used by zip or bzip2 (there's a
speed vs. compression quality tradeoff here). The data are stored so
that the contents of HEAD are always fulltext, while all other revisions
are stored as deltas.

The 3MB increase in string size you see could be caused by several factors:

    * The CAD system might totally rearranging the file, regardless of
      the actual amount of change. I've worked on one such system that
      reversed the sequence of elements in the file every time it was saved.
    * Our vdelta implementation looks at the files in 100k windows.
      Similar sequences larger than the window size won't get compressed
      efficiently.
    * The storage in the strings file might get fragmented during
      deltification. In this case, dumping and reloading the strings
      database (using db_dump/db_load, not svnadmin dump/load) would
      probably get rid of most of the fragmentation.

-- 
Brane Čibej   <brane_at_xbc.nu>   http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jun 27 12:56:47 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.