Re: Node origins cache rewrite

From: David Glasser <glasser_at_davidglasser.net>
Date: Mon, 28 Jan 2008 10:18:11 -0800

On Jan 27, 2008 8:08 AM, Mark Phippard <markphip_at_gmail.com> wrote:
>
> On Jan 25, 2008 3:50 PM, David Glasser <glasser_at_davidglasser.net> wrote:
> > On Jan 25, 2008 11:16 AM, David Glasser <glasser_at_davidglasser.net> wrote:
> > > On Jan 24, 2008 6:54 PM, Mark Phippard <markphip_at_gmail.com> wrote:
> > > > I see David has rewritten this to no longer use SQLite. Yay!
> > >
> > > Here's an alternative implementation. In FSFS, at commit time, new
> > > node IDs are rewritten from a temporary value like "_ab3" to a unique
> > > value by adding "ab3" to the "start_node_id" field in the current
> > > file. This makes them not only unique, but also part of an ordered
> > > sequence without gaps.
> > >
> > > Is it actually important that node IDs be ordered and gapless? We
> > > could just change new node-IDs (in format 3 repositories) to be built
> > > as "<rev>-ab3". get-node-origin-rev would be trivial on these nodes.
> > > Pre-format-3 repositories, or nodes in format 3 repositories that
> > > aren't dumped and loaded, would require the slow crawl.
> >
> > Like this. Can somebody review?
> >
> > [[[
> > In FSFS, instead of having a node-origin cache on disk, just change
> > the node-id to contain the node-origin-rev.
> >
> > That is, instead of (at commit finalization time) rewriting node IDs
> > based on a node ID counter in the "current" file, rewrite them as
> > "base36-REV". Do the same for copy IDs, just for the hell of it. Do
> > this only in Format 3.
> >
> > Now svn_fs_node_origin_rev is a trivial "look in the node ID"
> > operation, unless you're in Format 2 or a repository sneakily upgraded
> > without a dump and load (not really supported anyway), in which case
> > you still do the history walk.
> >
> > *******************************************************************
> > *** svn 1.5 adds "svnadmin recover" to FSFS which fixes the two ***
> > *** of current that were removed here; this code has not been ***
> > *** updated. ***
> > *******************************************************************
>
> Can you explain a little more the impact on existing repositories.
>
> 1) Dump/Load would generate the new node-ID so all would be good if
> you did that approach.

Yes.

> 2) Say I have 10,000 revisions in my 1.4 repository. I move to 1.5.

See, what does that mean? It could mean one of three things:

(a) You don't change the FS format of the repository at all, so it's
still '2'. You have the poor performance of uncached history-walks.
On the other hand, it is very likely that we will have to forbid you
from using merge tracking on such a repository anyway.

(b) You change the FS format to '3' using the *only currently
officially supported method*, which is a dump and a load. All's well.

(c) You change the FS format to '3' using the unsupported method of
manually changing the format number (and creating txn-current, and
creating txn-protorevs, etc etc etc). New node-IDs contain the rev in
them; old node-IDs (including new noderevs from old nodes) don't and
require the slow walk. But hey, you just did something unsupported
anyway.

(d) You run some sort of "svnadmin upgrade" command which does the
same thing as (c), except it's actually supported. Then, well, yeah,
you have the same downside, and us developers don't get the excuse of
"you did something unsupported".

Really I think if you care about the performance of this particular
operation, then you dump and load. (Or run svnsync, or whatever.)
It's not that hard, and as long as you have space on the machine it
shouldn't even require downtime. Justin said the ASF would be happy
with that.

> Do new nodes that are created pick up the new node ID's? So you get
> mixture of performance based on what node was created? Does the code
> detect whether the node-ID contains a revision based on some
> heuristic, or does it assume based on the format?

Assumes based on format.

> 3) Would it be possible to have a conversion routine that re-writes
> the node-ID's?

Not without being as expensive (and far more error-prone) as a dump and load.

> 4) How hard would it be to have a hybrid approach? Someone with a 1.4
> repository could incur the time to a dump/load, or they could run the
> svn-populate-node-origins-index routine to generate the current style
> cache. If we detect the new node-ID we use that, if not, we fallback
> to code that looks for the cache and lazy populates it?

I don't see how it's worth it.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org

Received on 2008-01-28 19:18:26 CET

This message: [ Message body ]
Next message: C. Michael Pilato: "Re: Node origins cache rewrite"
Previous message: C. Michael Pilato: "Re: Node origins cache rewrite"
In reply to: Mark Phippard: "Re: Node origins cache rewrite"
Next in thread: Mark Phippard: "Re: Node origins cache rewrite"
Reply: Mark Phippard: "Re: Node origins cache rewrite"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]