Re: [RFC] Eliminate scan_base_repos() - require repos-id and repos-relpath in all BASE nodes

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Fri, 29 Oct 2010 17:52:54 +0100

On Fri, 2010-10-29 at 12:06 +0100, Julian Foad wrote:
> On Thu, 2010-10-28 at 13:55 -0400, Greg Stein wrote:
> > On Thu, Oct 28, 2010 at 12:36, Hyrum K. Wright
> > <hyrum_wright_at_mail.utexas.edu> wrote:
> > > On Thu, Oct 28, 2010 at 11:29 AM, Mark Phippard <markphip_at_gmail.com> wrote:
> > >> On Thu, Oct 28, 2010 at 12:23 PM, Hyrum K. Wright
> > >> <hyrum_wright_at_mail.utexas.edu> wrote:
> > >>> On Thu, Oct 28, 2010 at 10:09 AM, Julian Foad <julian.foad_at_wandisco.com> wrote:
> > >>>> Bert and Erik and Philip and I discussed on IRC today the merits, or
> > >>>> lack of merits, of allowing repos-id/repos-relpath to be elided in the
> > >>>> NODES table BASE layer (op_depth = 0).
> > >>>>
> > >>>> * The data is not currently elided.
> > >>>>
> > >>>> * Some queries for locks currently assume the data is not elided.
> > >>>>
> > >>>> * Elision could save a bit of DB size, which *might* contribute to a
> > >>>> little bit of general DB performance.
> > >>>>
> > >>>> * The option of elision results in the need for all users of this data
> > >>>> to call svn_wc__db_scan_base_repos() or the internal version
> > >>>> scan_upwards_for_repos(), which is an extra maintenance burden and extra
> > >>>> run-time cost.
> > >>>>
> > >>>> We concluded it would be better to require the columns to be always
> > >>>> filled in. I'll do this soon if no objections.
> > >>>
> > >>> As I understand it, our original intention for elision was to make
> > >>> switch and relocate go much faster, since only one row would be
> > >>> updated instead of the entire tree. If that creates too much of a
> > >>> burden elsewhere, then yes, we can probably nuke it (and we can
> > >>> probably write a good enough query to make the relatively uncommon
> > >>> operations of switch and relocate only negligibly slower).
> > >>>
> > >>> This is one use case, however, and isn't necessarily the only one.
> > >>
> > >> At the same time, with code written to properly leverage SQL this is
> > >> the sort of scenario that we should not worry about anymore. SQLite
> > >> ought to be able to update all the rows in the DB faster than our old
> > >> code could walk the tree and open all the entries files, let alone
> > >> rewrite them.
> > >
> > > Correct. And we still have to do the update crawl on a switch or relocate, no?
> >
> > Dunno that we would have to do a crawl (unless you are limiting the
> > depth), but I think you guys are correct: on balance, the concept of
> > elision is costing us more than it is saving us.
> >
> > +1 to eliminating the concept, and ensuring that we always write them
> > out (it may look like it now, but be careful... there could be edge
> > cases)

OK, done in r1028809.

- Julian

> Great.
>
> The only tests that fail when I make scan_upwards_for_repos() abort at
> the point where it's about to scan upwards are:
>
> db-test 1
> entries-compat-test 1 2
>
> and those are because the DB rows are hand-coded. Easily fixed.
>
> > I would suggest making the columns NOT NULL in the schema. I don't
> > think we need to attempt a format bump to adjust older working copies
> > -- they just won't have the sqlite benefits from the additional schema
> > details.
>
> The repos columns are shared between BASE and WORKING op_depth layers,
> and we still allow them to be null when op_depth > 0, so we can't say
> "NOT NULL".
>
> - Julian
>
>
Received on 2010-10-29 18:53:36 CEST

This message: [ Message body ]
Next message: Daniel Shahaf: "Re: RFC: revised text for "svn help merge""
Previous message: Stephen Butler: "Re: RFC: revised text for "svn help merge""
In reply to: Julian Foad: "Re: [RFC] Eliminate scan_base_repos() - require repos-id and repos-relpath in all BASE nodes"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]