[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: text-base penalty: A proposed solution

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2002-12-17 16:26:31 CET

On Tue, 2002-12-17 at 01:17, Kean Johnston wrote:
> > The prime benefit which motivated the text-base was limiting
> > the network bandwidth used by commits, as I understand it.
> Would a copy-on-edit model not address that and the speedy
> diff requirement?

Yup. You'd get speedier diffs (read one file per directory, instead of
stat all the files; *really* speedy diffs requires another compromise)
and no text base penalty for unedited files, and you also dovetail
nicely into advisory locks.

But for the default mode of operation, we wanted to preserve the CVS
model where you don't have to tell svn before you edit a file.

> I don't know the code and I am sure you do, but how genericaly useful
> is the filesystem-inside-DB code? Is it worth thinking about having
> the admin area be such a database rather than actual physical files?

Oh, I wasn't thinking of using a database. I was just thinking that
props are generally small. It's okay to have to rewrite the props base,
props, and wcprops each time you change any of the above.

It would be kind of nifty to turn ".svn" into a file containing a
database. But I wouldn't want to see this happen for the following
practical reasons:

  * We really want to allow working copies to work well in network
filesystems. Databases and network filesystems don't mix; if they work
at all, performance usually sucks.

  * Berkeley DB periodically changes its format incompatibly, such that
the same API has different effects in different versions of operating
systems. Red Hat 8.1's Subversion package might naturally build itself
against DB 4.1, while Red Hat 10.2's package might naturally build
itself against DB 5. Now everyone's working copy is invalid. That's no
good.

  * I think Berkeley DB has had an impact on our performance which is
difficult to understand. (Certainly, before we started using duplicate
keys, it created an O(n^2) temporary disk space usage problem for
checkins of a large file. That was poor. But I think it continues to
have an impact today which we can't easily gauge.)

  * Berkeley DB has definitely introduced some scaling limitations of
the form "you have to go in and edit the DB configuration once you hit a
certain point." Although it's possible we can work around that by doing
less work per DB transaction, that's another example of how it simply
doesn't do what it's supposed to all the time.

  * The only serious alternative to Berkeley DB I know about is gdbm,
which is GPL, and we don't want Subversion relying on GPL libraries. I
imagine gdbm has its own flaws, probably in the performance area.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Dec 17 16:30:50 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.