[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Delta Question

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2006-10-04 18:01:20 CEST

On Oct 3, 2006, at 11:46 PM, Troy Curtis Jr wrote:
> My question to you is can you think of a case (actually a reason since
> I have a case) where the deltafied dump file is ~25% the size of the
> actual repository? Here are my stats (approx. as I can't remember the
> exact values...this is a repo at work)
>
> RCS: ~1GB
> Subversion: ~2GB (only a couple hundred megs different between fsfs
> and bdb)
> Dump file: <500MB

The problem here is most actually with directory data, in all
likelihood. Subversion's storage of file data is reasonably
efficient, but its storage of directory data is quite wasteful,
particularly when you have lots of single-file commits within big
directories. Repositories resulting from cvs2svn often have a lot of
that depending on how cvs was used. Dump files don't have this
problem since they don't try to support efficient random access or
traversal of directory data.

When I was more involved with Subversion, I thought about some ways
to store directory data more efficiently. My best idea was a btree
with multiple roots, each root representing a revision of the
directory. (This would require storing all of the directory
revisions together, so wouldn't work for FSFS, but could be applied
to BDB or some future back end.) I never fully fleshed out the idea
or even documented it properly, though.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Oct 4 18:01:15 2006

This is an archived mail posted to the Subversion Dev mailing list.