[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: pristine store design

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Wed, 03 Mar 2010 15:04:48 +0000

Neels J Hofmeyr wrote:
> Julian Foad wrote:
> > I want us to be able to move on to bigger things like considering
> > whether to build a "pristine whole nodes" store on top of the "pristine
> > texts" store, because I think we should be able to reference a whole
> > node (especially a tree-conflict old/theirs/mine node) with a single
> > handle, whether that handle be a checksum or a "repos:rev:path" triple.
> You mean "whole subtrees"?? I don't understand.

A local cache [1] of nodes from the repository. Each file would have not
only its text but also its props and perhaps other metadata such as
copyfrom info. Each dir would props, other metadata, and a list of its
children. Each child might or might not be in the cache.

([1] "cache" in the sense of "optional, exact copy"; higher layers can
go fetch the equivalent node from the repository if it's not in the
cache, but this "cache" could also have a property whereby it guarantees
to keep all nodes (or particular nodes as specified by the caller) until
the caller explicitly deletes them.)

The "properties" and "pristine checksum" and some other columns in wc.db
could then be replaced by a single "node handle", which I think could
simplify some of the problems with referring to a "revert-base of a
replaced child of a copied parent" and such like.

But I'm getting too far ahead.

> > I continue to argue below why I think a stream-based interface is better
> > than path-based, but the choice of "write" method is not a critical
> > decision, as long as it stays within libsvn_wc. I'm not keen to spend
> > more time on that, now that I've looked and as far as I can tell it
> > doesn't need to be exposed outside libsvn_wc.
> Note that the entire pristine API we are discussing is private to svn_wc,
> like svn_wc__db_read_info() et al..

OK. Although I saw the "__" in their names, I hadn't really appreciated
until now that the dir/filename transferred between them is necessarily
staying private. I'm glad. That makes this a whole lot less important.

> > Yes, and that looks neater, but still has all the problems caused by
> > exposing the path.
> For the records, we'd never expose the final pristine-file path, not even
> within libsvn_wc. This whole thing about having a write location is all
> about the temp file that will finally be moved into place.

Yup, I know.

> So at no point will any function that does not have _pristine_ in its name
> be able to find the final pristine file location. Only the tempdir is known,
> and only inside libsvn_wc.
> The final _install() call would then have the option to just 'mv' the file
> into its final location or do whatever with it. We don't lose feasibility of
> any features by doing this.

> >> So IMHO it boils down to: Do you think it is better to opaquify
> >> such baton for the sacrifice of not being able to get a write destination on
> >> the target device in advance?
> >
> > Yes, I think that is the better API for general use. (Note that the
> > stream would always write to a file that is on the destination device.
> > The only thing you can't get with this API is a path to that file (or
> > directory).)
> Hm, this argument actually revolves around: will we have cases where a
> tempfile already exists, and where we have no control on redirecting streams
> into the pristine store? When committing new or modified files, we need to
> copy&checksum, can't checksum in-place; When receiving from the repos, we
> can direct the stream straight into the pristine store, and validate proper
> reception via that.
> I can't come up with a use case of interest. If there aren't (going to be)
> any, I actually agree with a stream-only write API. Reversing my own
> argument, we can still split such stream-only function into _get_temp_dir()
> and _install() later on, making the original stream-only function a
> convenience function. So I am actually convinced before even looking at the
> disadvantages. Thanks :)
> >
> >
> > Now let's look at some disadvantages of "return a directory path".
> > If the path is exposed just within libsvn_wc, these issues stay under
> > our control, so that's OK. If the path is exposed in a public API, to
> > libsvn_client and beyond, that's when I worry.
> (No paths would ever leak out of libsvn_wc. The pristine-tempdir path would
> be known inside libsvn_wc. I'd say we can trust libsvn_wc to use a proper
> function for creating a unique file. But see my being convinced above.)

OK, I feel much better knowing it's staying within libsvn_wc. And it has
the advantage of already being implemented :-)

> > [...]

+1 on (from your previous email) removing svn_wc__db_checkmode_t and
having a single explicit check() function.

> > [DBs and concurrency ...]
> > [...]
> >
> > I'm not replying to those parts in this email.
> heh, me and stsp are also coming up with new directions by the hour.
> We were discussing that having a separate pristine database basically sucks.
> We need to ensure atomicity of mv-into-place with write-db-row.
> This would be more straightforward if we had no DB, but instead an
> information header in each pristine file itself.
> Given that we checksum all read operations, we only want to store the MD5
> sum! The initial header could simply be always fixed 32 bytes of MD5
> checksum. And even this is just a speed optimization. Given that we checksum
> all reads, all reads can *calculate* the MD5 sum on-the-fly. Pushing for
> simplicity here.
> If we want more values in the future, like compression, we introduce a skel.
> Any minus-ones?

Sounds OK. I can't see any need for the "pristine" table in wc.db at the
moment. I note that the svn_wc__db_pristine_*() functions don't use the
DB except to determine the PDH and local_relpath.

On the other hand I don't think ensuring consistency of the db and the
file system should be a big hassle, but I haven't thought exactly how to
do it.

The important thing is that the APIs have the DB parameter so we can add
in some DB <-> file consistency later if we want it.

- Julian
Received on 2010-03-03 16:05:27 CET

This is an archived mail posted to the Subversion Dev mailing list.