[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Node origins cache rewrite

From: Mark Phippard <markphip_at_gmail.com>
Date: Sun, 27 Jan 2008 11:08:15 -0500

On Jan 25, 2008 3:50 PM, David Glasser <glasser_at_davidglasser.net> wrote:
> On Jan 25, 2008 11:16 AM, David Glasser <glasser_at_davidglasser.net> wrote:
> > On Jan 24, 2008 6:54 PM, Mark Phippard <markphip_at_gmail.com> wrote:
> > > I see David has rewritten this to no longer use SQLite. Yay!
> >
> > Here's an alternative implementation. In FSFS, at commit time, new
> > node IDs are rewritten from a temporary value like "_ab3" to a unique
> > value by adding "ab3" to the "start_node_id" field in the current
> > file. This makes them not only unique, but also part of an ordered
> > sequence without gaps.
> >
> > Is it actually important that node IDs be ordered and gapless? We
> > could just change new node-IDs (in format 3 repositories) to be built
> > as "<rev>-ab3". get-node-origin-rev would be trivial on these nodes.
> > Pre-format-3 repositories, or nodes in format 3 repositories that
> > aren't dumped and loaded, would require the slow crawl.
>
> Like this. Can somebody review?
>
> [[[
> In FSFS, instead of having a node-origin cache on disk, just change
> the node-id to contain the node-origin-rev.
>
> That is, instead of (at commit finalization time) rewriting node IDs
> based on a node ID counter in the "current" file, rewrite them as
> "base36-REV". Do the same for copy IDs, just for the hell of it. Do
> this only in Format 3.
>
> Now svn_fs_node_origin_rev is a trivial "look in the node ID"
> operation, unless you're in Format 2 or a repository sneakily upgraded
> without a dump and load (not really supported anyway), in which case
> you still do the history walk.
>
> *******************************************************************
> *** svn 1.5 adds "svnadmin recover" to FSFS which fixes the two ***
> *** of current that were removed here; this code has not been ***
> *** updated. ***
> *******************************************************************

Can you explain a little more the impact on existing repositories.

1) Dump/Load would generate the new node-ID so all would be good if
you did that approach.

2) Say I have 10,000 revisions in my 1.4 repository. I move to 1.5.
Do new nodes that are created pick up the new node ID's? So you get
mixture of performance based on what node was created? Does the code
detect whether the node-ID contains a revision based on some
heuristic, or does it assume based on the format?

3) Would it be possible to have a conversion routine that re-writes
the node-ID's?

4) How hard would it be to have a hybrid approach? Someone with a 1.4
repository could incur the time to a dump/load, or they could run the
svn-populate-node-origins-index routine to generate the current style
cache. If we detect the new node-ID we use that, if not, we fallback
to code that looks for the cache and lazy populates it?

I imagine #4 has a lot of ickiness in terms of code bloat. I just
know that dump/load can be a real burden for some repositories and
they might prefer the option to carry the extra disk space of the
current style cache.

Justin, what would ASF likely do? Would you be able to dump/load that
repository or would you just incur the disk space cost of the cache
that currently exists? I imagine you could symlink that folder to a
volume that handles lots of small files more efficiently.

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-01-27 17:08:27 CET

This is an archived mail posted to the Subversion Dev mailing list.