[RFC] Rationalize WC internal API to read pristine and working versions

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Thu, 17 Mar 2011 13:36:42 +0000

I'd like to rationalize this set of internal libsvn_wc functions:

svn_wc__get_pristine_contents(path)
svn_wc__get_ultimate_base_contents(path)

svn_wc__get_pristine_text_status(path)

  svn_wc__text_base_path_to_read(path)
  svn_wc__ultimate_base_text_path(path)
  svn_wc__ultimate_base_text_path_to_read(path)

svn_wc__get_ultimate_base_checksums(path)
svn_wc__get_working_checksums(path)

svn_wc__get_actual_props(path)
svn_wc__get_pristine_props(path)

They're a really rather random set of getter functions over the possible
space of information getting, and they have inconsistent names. I have
contributed to the mess when I added functions into the set.

Another problem with them is that each one retrieves only a tiny
isolated bit of information about the node, and yet requires the WC to
first determine whether a versioned node exists at that path and which
WC it belongs to, open the DB, convert abspath to relpath, and so on.

This situation could be improved in one of two directions. Either have
single-datum getters like this but have each one take a reference to an
internal "versioned node" object rather than (db context, abspath):

svn_wc__get_props(versioned_node_object)

or have each function return a set of data about the versioned node it
finds:

svn_wc__get_base_node_info(path)

I think the best thing to do at this point is the latter, like with the
svn_wc__db_read_info except without so many parameters :-) Have a small
group of functions to access the state of the base node at a given path,
maybe like this:

  read_base_node(path) => kind, props
  read_base_file(path) => text-stream, checksum, size
  read_base_dir(path) => children, depth
  read_base_link(path) => target

and an identical set of functions that access the whole state of the
topmost pristine version[2]:

  read_pris_node(path) => kind, props
  read_pris_file(path) => text-stream, checksum, size
  read_pris_dir(path) => children, depth
  read_pris_link(path) => target

and some similar functions that access the state of the current working
node:

read_curr_node(path) => kind, props

but not read_curr_file/dir/link(), for reasons mentioned below.

The important principles here are:

* The function names follow a consistent pattern. The actual names
are subject to bikeshedding of course [1].

* Each function returns a group of closely related information, and
each output parameter is optional. When we read "a node", a single
function should fetch everything the caller wants to know about the
generic node. When we read "a file", a single function should fetch
everything the caller wants to know about the file, and so on.

The exact set of info returned by each class of function is to be
decided. For example, we would need to decide whether read_*_file()
should return both SHA-1 and MD-5 checksums or just the SHA-1, and the
decision would depend partly on how commonly a caller wants the MD-5.

* Reading the topmost pristine version is semantically identical to
reading the base version, so the set of APIs is identical. (If we were
to try to push this principle outside the realm of simple getter
functions, we would find some semantic differences, of course.)

* Reading a (base or topmost) pristine node is semantically similar to
reading the current working version. However, here the practical
differences are enough that we may not wish to provide the same APIs.
For example, "get the checksum" would be a slow or deferred operation
with the working file because it's not pre-calculated, and the disk
state can be different from the metadata state, and reading the
file/dir/link can be done by the caller without going through the WC
API, and so on.

Thoughts?

- Julian

[1] We've so overloaded the word "working" that I think in this case
it's better to use some other word such as "actual" or "current". And
we've overloaded the word "base" so that I felt compelled to invent this
horrible "ultimate base" phrase to distinguish it from existing
functions that used "base" to mean "the topmost pristine version". We
can get rid of that "ultimate base" phrase now.

[2] Just to make sure we're understanding each other, the "topmost"
pristine version is the "base" version in simple cases but is different
when there are scheduled copies and so on. It's what was often called
"base" or "the normal base" in WC-1.

### Maybe rationalize these too.

svn_wc__has_magic_property
svn_wc_has_binary_prop
svn_wc__internal_propget
svn_wc__internal_propdiff
svn_wc__internal_transmit_prop_deltas
svn_wc_get_prop_diffs2
svn_wc_prop_get2
svn_wc_props_modified_p2
svn_wc_transmit_prop_deltas2
Received on 2011-03-17 14:37:27 CET

This message: [ Message body ]
Next message: Greg Stein: "Re: Property storage / modification in WC-NG"
Previous message: Simon Wilson: "Re: svn delete fails when deleting multiple URLs with URL-encoded spaces"
Next in thread: Greg Stein: "Re: [RFC] Rationalize WC internal API to read pristine and working versions"
Reply: Greg Stein: "Re: [RFC] Rationalize WC internal API to read pristine and working versions"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]