On Mar 16, 2011 5:34 PM, "C. Michael Pilato" <cmpilato_at_collab.net> wrote:
>
> On 03/16/2011 01:17 PM, Greg Stein wrote:
> > On Wed, Mar 16, 2011 at 12:59, C. Michael Pilato <cmpilato_at_collab.net>
wrote:
> >> ...
> >> to manage at least the "read" subset of these operations. But I find
myself
> >> wondering if we wouldn't be better served by having a properties table
with
> >> rows for, I dunno: wc_id, local_relpath, property_name,
property_value.
> >> ...
> >> Was this considered when we moved the properties into the database? If
so,
> >> why didn't we take this approach? Should we consider it now? Should
we
> >> punt it to 1.8?
> >
> > It was considered. Hyrum and I figured it would be best to use a skel
> > and avoid a join. We assumed it is the rare case that we need a single
> > property, rather than some/all of the properties.
> >
> > If you want to experiment with another table and a JOIN, then I would
> > recommend waiting until 1.8 to do that. If we find that properties in
> > their current form are killing us, then we can discuss further.
> >
> > My understanding is that # queries is our concern at the moment,
> > rather than skel-unpacking.
> >
> > Cheers,
> > -g
>
> Thanks for the background, Greg.
>
> It's definitely number-of-queries that I'm thinking about here, too.
>
> I'm *not* concerned about the pure cost of mere skel-unpacking. It's more
> that because properties aren't first-class citizens in the schema, we have
> to trade what could be a single statement:
>
> "Go add/change the prop/val pair FOO=BAR on every path at or under
> TARGET"
I think we want to optimize for reads over writes. And so I think avoiding a
join will be better.
>
> into a one-at-a-time, many-statements approach:
>
> "for PATH1, read it's properties skel, parse the skel, set FOO=BAR in
> its propset, re-skel-ify, and update the skel; now do that for PATH2,
> whose resulting skel won't necessarily be the same as PATH1's; now do it
> for PATH3..."
>
> Even when just reading properties, our best option is to read the whole
> property set for chunks of files/dirs, and then immedately throw out all
the
> properties we don't care about. With a purer bit of relation in the
schema
> for properties, these queries get simpler and waste less intermediate
memory.
The average number of properties per node is small. I wouldn't worry about
memory.
If you find a problem, then I think we can fix it in 1.8. If you find
something horrendous, then maybe 1.7. But do we have any indication of a
problem here?
>
> -- C-Mike
>
> (PS: Happy birthday!)
Thanks! :-)
Cgeers,
-g
>
> --
> C. Michael Pilato <cmpilato_at_collab.net>
> CollabNet <> www.collab.net <> Distributed Development On Demand
>
Received on 2011-03-17 14:42:22 CET