[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: WC-NG: Commit with new pristine store and SHA-1 checksums

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Fri, 23 Apr 2010 15:10:02 +0100

On Wed, 2010-04-21, Julian Foad wrote:
> Greg Stein wrote:
> > On Wed, Apr 21, 2010 at 05:09, Philip Martin <philip.martin_at_wandisco.com> wrote:
> > > Julian Foad <julian.foad_at_wandisco.com> writes:
> > >
> > >> COMPATIBILITY
> > >> =============
> > >>
> > >> We need to keep the old WC interface working:
> > >>
> > >> svn_wc_transmit_text_deltas2(&tempfile, &md5_digest, ...)
> > >> svn_wc_queue_committed2(queue, path, ..., md5_checksum)
> > >> svn_wc_process_committed_queue(queue, ...)
> > >>
> > >> How? I can't see a way to communicate the SHA-1 checksum to
> > >> svn_wc_process_committed_queue() via the queue, but I can think of the
> > >> following ways.
> > >
> > > There is an access baton in the old interface, it's opaque and can be
> > > made to store anything. It could contain a hash of filename=>SHA-1.
>
> Thanks, Philip - I hadn't thought of that. I'll bear it in mind. It
> could be useful for other things too.
>
> > Presumably, transmit_text_deltas2 is a wrapper around deltas3. Thus,
> > deltas2 has the SHA1 value from that inner call.
> >
> > Putting something into *TEMPFILE is optional, so we can skip that. The
> > MD5 result from deltas2 can be fetched from the PRISTINE table, given
> > the SHA1 key.
>
> Yup, transmit_text_deltas2() can easily know and return the MD-5.
>
> > queue_committed2 can use the MD5 value and key into PRISTINE (we
> > should have an index on PRISTINE.md5_checksum) to find the SHA1.
>
> I wondered about looking up the pristine text from its MD-5. Certainly
> possible (preferably via an index for speed).
>
> At first I had a slight concern about the remote possibility of MD-5
> collisions. I now think we can alleviate the concern by checking if a
> new pristine text ever has an MD-5 that's already recorded in the
> pristine store against a different SHA-1. If that ever happens, we can
> issue a warning or error, the resolution of which is "upgrade to 1.7+,
> which no longer relies on MD-5 uniqueness".
>
> The bit I'm not sure about is whether the MD-5 of *every* new text base
> in the commit is actually passed through the queue. I'll go and test
> whether it is - it doesn't look like it, from the way I read the code.

BTW I confirmed that, yesterday: it's true with the current code.
However, the APIs have been through several revisions, and the oldest
ones - svn_wc_process_committed() and svn_wc_process_committed2() -
don't even communicate the MD5 checksum on to the post-commit step.

In IRC discussion with Greg we decided a probable strategy for dealing
with the old APIs is to make them rely on a fixed, deterministic, tmp
path, like they always have done. Instead of the WC-1 scheme which gave
a path like

  <dir/somedir/.svn/tmp/text-base/FOO>

the WC-NG code will provide a path that's safe to be in a single .svn
dir at the root of a WC, such as

  <.svn/tmp/text-base/dir/somedir/FOO>

or maybe encoding <dir/somedir/foo> into a single path component if
that's better than dealing with arbitrary levels of subdirs.

- Julian

> > This seems pretty straight-forward, unless I've missed something.
> >
> > (note that the pristine store would have this "extra" pristine,
> > unreferenced by any other table; that could get garbage-cleaned by a
> > separate process... BUT: there is an admin lock present during a
> > commit, so we'd simply avoid GC'ing the PRISTINE table/on-disk)
>
> Yup, I'm happy that we can manage the GC properly, in one way or
> another.
>
> Thanks for the feedback.
>
>
> - Julian
>
>
Received on 2010-04-23 16:10:37 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.