Michael Wood wrote:
>On Thu, Jun 26, 2003 at 01:12:06PM -0400, Clint Chapman wrote:
>[snip]
>
>
>>Not Exactly, I did 5 more commits on the 10MB CAD file and the db
>>directory would grow by about 25 megs every time while the strings
>>file was growing by about 3 MB every time. It makes me wonder how
>>well the binary diffing is working since the 10MB CAD file compresses
>>to 2.3 MB in zip format and I understand the repository is bzipped.
>>
>>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>[snip]
>
>I assume you mean that Subversion compresses the repository using the
>same algorithm as bzip2?
>
>As far as I understand it, some nodes in the repository are stored as
>deltas against other nodes, while some are stored undeltified. This is
>to balance size against speed. There might be some compression other
>than the deltification, but it's not bzip2 AFAIK.
>
>
The algorithm we use to create binary deltas is called vdelta. It's a
block-copy delta algorithm (i.e., the output does not contain context
information) and it's also a complression algorithm, although the
compression is not as good as the one used by zip or bzip2 (there's a
speed vs. compression quality tradeoff here). The data are stored so
that the contents of HEAD are always fulltext, while all other revisions
are stored as deltas.
The 3MB increase in string size you see could be caused by several factors:
* The CAD system might totally rearranging the file, regardless of
the actual amount of change. I've worked on one such system that
reversed the sequence of elements in the file every time it was saved.
* Our vdelta implementation looks at the files in 100k windows.
Similar sequences larger than the window size won't get compressed
efficiently.
* The storage in the strings file might get fragmented during
deltification. In this case, dumping and reloading the strings
database (using db_dump/db_load, not svnadmin dump/load) would
probably get rid of most of the fragmentation.
--
Brane Čibej <brane_at_xbc.nu> http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jun 27 12:56:47 2003