[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: BDB log file factoids (prbly part of why they are so big)

From: Tom Lord <lord_at_emf.net>
Date: 2003-04-06 00:13:00 CEST

Ok, sorry to be playing catch-up here:

       [And a question: is the deltafied history of a file a single
       BDB datum, or is it broken into records at commit boundaries?
       If the former, then even absent redeltafication, any change to
       a file logs the complete history, before and after.]

Looking at some hopefully-still-sufficiently-up-to-date documentation
(subversion/libsvn_fs/structure in a slightly out-of-date source
tree), and hopefully not misunderstanding it too badly: the latest
revision of a file on each branch is always kept full text, right?

So the implication is that upon a commit to some file, the previous
head revision is destructively replaced by a delta, the new revision
stored full-text?

So that means that to commit a change to a file, the BDB log file
should grow by at least:

        2 * full_text_size + delta_size

for that file. (Where `full_text_size' is the average of the full
text sizes of the old and new head revisions.)

To accomplish the same effect, an app-level journal would grow by:

        delta_size

which, uncoincidentally enough, is the same approximate amount by
which the db data file grows for that part of the operation.

The BDB log and the database data file will also grow to reflect
updates to the containing directory and its ancestors (again, if I'm
reading this right) -- none of that would appear in an app-level
journal.

I'll go out on a speculative limb a bit here, and beyond the topic of
log file mgt: it seems to me (on the basis of the "structure"
document) that you pay a steep cost in database updates and log space
to provide the "global-to-repository revision number" which
effectively serializes all write txns to svn databases. Furthermore,
I see little actual user benefit to that serialization: writes to
unrelated projects or branches within a repository need not be
ordered. None of this would really matter for 1.0, except that revids
appear in the user interface, are important to merging, and are likely
to be incorporated into user scripts and usage habits. Oddly enough,
with the sort of archish-structure I've talked about layering over
svn, I think it quite plausible to surface a slightly different UI in
which revids are truly hidden -- thus opening up a new degree of
freedom for the implementation. Recall that in my proposal, the role
currently played by the revid is instead played by
(project/branch-specific) names in the fs namespace.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Apr 6 00:04:30 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.