[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Rationalize WC internal API to read pristine and working versions

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Thu, 17 Mar 2011 21:01:51 +0000

Greg Stein wrote:
> On Mar 17, 2011 1:37 PM, "Julian Foad" <julian.foad_at_wandisco.com>
> wrote:
> >
> > I'd like to rationalize this set of internal libsvn_wc functions:
>
> Is this something that needs to be fixed for 1.7?

Not *specifically*, it's just something that makes my life harder every
time I have to deal with any code that calls such functions.

> >
> > svn_wc__get_pristine_contents(path)
> > svn_wc__get_ultimate_base_contents(path)
> >
> > svn_wc__get_pristine_text_status(path)
> >
> > svn_wc__text_base_path_to_read(path)
> > svn_wc__ultimate_base_text_path(path)
> > svn_wc__ultimate_base_text_path_to_read(path)
> >
> > svn_wc__get_ultimate_base_checksums(path)
> > svn_wc__get_working_checksums(path)
> >
> > svn_wc__get_actual_props(path)
> > svn_wc__get_pristine_props(path)
> >
> > They're a really rather random set of getter functions over the possible
> > space of information getting, and they have inconsistent names. I have
> > contributed to the mess when I added functions into the set.
> >
> > Another problem with them is that each one retrieves only a tiny
> > isolated bit of information about the node, and yet requires the WC to
> > first determine whether a versioned node exists at that path and which
> > WC it belongs to, open the DB, convert abspath to relpath, and so on.
> >
> > This situation could be improved in one of two directions. Either have
> > single-datum getters like this but have each one take a reference to an
> > internal "versioned node" object rather than (db context, abspath):
> >
> > svn_wc__get_props(versioned_node_object)
>
> No. Then you would need a getter to *get* that versioned object. The
> indirection doesn't buy you much over the standard db/abspath pair
> already used everywhere.

That's only true if the code is, out of the blue, just making one single
query about the props of a node at a particular path. On the other
hand, if for example we were using node walker which returns some kind
of "reference" to an "object" at each interesting location, then this
kind of call would make sense. That's what I was getting at. Not
saying it would plug snugly into the set of abspath-indexed APIs that we
have now.

> > or have each function return a set of data about the versioned node it
> > finds:
> >
> > svn_wc__get_base_node_info(path)
> >
> > I think the best thing to do at this point is the latter, like with the
> > svn_wc__db_read_info except without so many parameters :-) Have a small
> > group of functions to access the state of the base node at a given path,
> > maybe like this:
> >
> > read_base_node(path) => kind, props
> > read_base_file(path) => text-stream, checksum, size
> > read_base_dir(path) => children, depth
> > read_base_link(path) => target
> >
> > and an identical set of functions that access the whole state of the
> > topmost pristine version[2]:
> >
> > read_pris_node(path) => kind, props
> > read_pris_file(path) => text-stream, checksum, size
> > read_pris_dir(path) => children, depth
> > read_pris_link(path) => target
>
> Call it "topmost" rather than "pris".

Good plan.

> > and some similar functions that access the state of the current working
> > node:
> >
> > read_curr_node(path) => kind, props
>
> This is already known as "actual", and I see no reason to change that now.

Right ... kind of. Actually, we'll have to decide whether we're wanting
to query the metadata or the on-disk state or both separately or some
hybrid.

> > but not read_curr_file/dir/link(), for reasons mentioned below.
> >
> > The important principles here are:
> >
> > * The function names follow a consistent pattern. The actual names
> > are subject to bikeshedding of course [1].
> >
> > * Each function returns a group of closely related information, and
> > each output parameter is optional. When we read "a node", a single
> > function should fetch everything the caller wants to know about the
> > generic node. When we read "a file", a single function should fetch
> > everything the caller wants to know about the file, and so on.
> >
> > The exact set of info returned by each class of function is to be
> > decided. For example, we would need to decide whether read_*_file()
> > should return both SHA-1 and MD-5 checksums or just the SHA-1, and the
> > decision would depend partly on how commonly a caller wants the MD-5.
> >
> > * Reading the topmost pristine version is semantically identical to
> > reading the base version, so the set of APIs is identical. (If we were
> > to try to push this principle outside the realm of simple getter
> > functions, we would find some semantic differences, of course.)
> >
> > * Reading a (base or topmost) pristine node is semantically similar to
> > reading the current working version. However, here the practical
> > differences are enough that we may not wish to provide the same APIs.
> > For example, "get the checksum" would be a slow or deferred operation
> > with the working file because it's not pre-calculated, and the disk
> > state can be different from the metadata state, and reading the
> > file/dir/link can be done by the caller without going through the WC
> > API, and so on.
> >
> > Thoughts?
>
> "Why?"

'Cause less suckiness makes fixing our other immediate problems
easier. :-)

- Julian
Received on 2011-03-17 22:02:29 CET

This is an archived mail posted to the Subversion Dev mailing list.