What both of you say makes sense. What I can add is an observation that
separating the props out into the schema would appear to benefit the
efficiency of bulk-data operations such as "svn pget -R svn:needs-lock".
I don't think it would have much effect on per-node operations such as
"update", "merge", "diff" and so on. It shouldn't slow them down
because it would be nicely indexed, so it could still be an overall win.
At least that's how it feels in my head.
>From another angle, this reminds me that I am thinking in general we
tend to be too separationist. We create lots of little APIs that
retrieve individual bits of a node's state, as per my "Rationalize WC
internal API" email a few minutes ago. I think we need to make more
effort to treat a node-state as a structured group of data. Applying my
argument to properties, we should try to keep "the properties of this
node" as an accessible object in some of the APIs. Now, that doesn't by
any means require that the data storage be grouped in the same way. We
could certainly store individual properties indexed by (wcid, path,
op-depth) or whatever, and still group them together where appropriate
in the APIs and also support bulk read and write APIs efficiently.
- Julian
On Thu, 2011-03-17 at 09:41 -0400, Greg Stein wrote:
> On Mar 16, 2011 5:34 PM, "C. Michael Pilato" <cmpilato_at_collab.net> wrote:
> >
> > On 03/16/2011 01:17 PM, Greg Stein wrote:
> > > On Wed, Mar 16, 2011 at 12:59, C. Michael Pilato <cmpilato_at_collab.net>
> wrote:
> > >> ...
> > >> to manage at least the "read" subset of these operations. But I find
> myself
> > >> wondering if we wouldn't be better served by having a properties table
> with
> > >> rows for, I dunno: wc_id, local_relpath, property_name,
> property_value.
> > >> ...
> > >> Was this considered when we moved the properties into the database? If
> so,
> > >> why didn't we take this approach? Should we consider it now? Should
> we
> > >> punt it to 1.8?
> > >
> > > It was considered. Hyrum and I figured it would be best to use a skel
> > > and avoid a join. We assumed it is the rare case that we need a single
> > > property, rather than some/all of the properties.
> > >
> > > If you want to experiment with another table and a JOIN, then I would
> > > recommend waiting until 1.8 to do that. If we find that properties in
> > > their current form are killing us, then we can discuss further.
> > >
> > > My understanding is that # queries is our concern at the moment,
> > > rather than skel-unpacking.
> > >
> > > Cheers,
> > > -g
> >
> > Thanks for the background, Greg.
> >
> > It's definitely number-of-queries that I'm thinking about here, too.
> >
> > I'm *not* concerned about the pure cost of mere skel-unpacking. It's more
> > that because properties aren't first-class citizens in the schema, we have
> > to trade what could be a single statement:
> >
> > "Go add/change the prop/val pair FOO=BAR on every path at or under
> > TARGET"
>
> I think we want to optimize for reads over writes. And so I think avoiding a
> join will be better.
>
> >
> > into a one-at-a-time, many-statements approach:
> >
> > "for PATH1, read it's properties skel, parse the skel, set FOO=BAR in
> > its propset, re-skel-ify, and update the skel; now do that for PATH2,
> > whose resulting skel won't necessarily be the same as PATH1's; now do it
> > for PATH3..."
> >
> > Even when just reading properties, our best option is to read the whole
> > property set for chunks of files/dirs, and then immedately throw out all
> the
> > properties we don't care about. With a purer bit of relation in the
> schema
> > for properties, these queries get simpler and waste less intermediate
> memory.
>
> The average number of properties per node is small. I wouldn't worry about
> memory.
>
> If you find a problem, then I think we can fix it in 1.8. If you find
> something horrendous, then maybe 1.7. But do we have any indication of a
> problem here?
>
> >
> > -- C-Mike
> >
> > (PS: Happy birthday!)
>
> Thanks! :-)
>
> Cgeers,
> -g
>
> >
> > --
> > C. Michael Pilato <cmpilato_at_collab.net>
> > CollabNet <> www.collab.net <> Distributed Development On Demand
> >
Received on 2011-03-17 14:58:32 CET