[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: WC-NG: Commit with new pristine store and SHA-1 checksums

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Wed, 21 Apr 2010 11:28:43 +0100

Greg Stein wrote:
> On Wed, Apr 21, 2010 at 05:09, Philip Martin <philip.martin_at_wandisco.com> wrote:
> > Julian Foad <julian.foad_at_wandisco.com> writes:
> >
> >> COMPATIBILITY
> >> =============
> >>
> >> We need to keep the old WC interface working:
> >>
> >> svn_wc_transmit_text_deltas2(&tempfile, &md5_digest, ...)
> >> svn_wc_queue_committed2(queue, path, ..., md5_checksum)
> >> svn_wc_process_committed_queue(queue, ...)
> >>
> >> How? I can't see a way to communicate the SHA-1 checksum to
> >> svn_wc_process_committed_queue() via the queue, but I can think of the
> >> following ways.
> >
> > There is an access baton in the old interface, it's opaque and can be
> > made to store anything. It could contain a hash of filename=>SHA-1.

Thanks, Philip - I hadn't thought of that. I'll bear it in mind. It
could be useful for other things too.

> Presumably, transmit_text_deltas2 is a wrapper around deltas3. Thus,
> deltas2 has the SHA1 value from that inner call.
>
> Putting something into *TEMPFILE is optional, so we can skip that. The
> MD5 result from deltas2 can be fetched from the PRISTINE table, given
> the SHA1 key.

Yup, transmit_text_deltas2() can easily know and return the MD-5.

> queue_committed2 can use the MD5 value and key into PRISTINE (we
> should have an index on PRISTINE.md5_checksum) to find the SHA1.

I wondered about looking up the pristine text from its MD-5. Certainly
possible (preferably via an index for speed).

At first I had a slight concern about the remote possibility of MD-5
collisions. I now think we can alleviate the concern by checking if a
new pristine text ever has an MD-5 that's already recorded in the
pristine store against a different SHA-1. If that ever happens, we can
issue a warning or error, the resolution of which is "upgrade to 1.7+,
which no longer relies on MD-5 uniqueness".

The bit I'm not sure about is whether the MD-5 of *every* new text base
in the commit is actually passed through the queue. I'll go and test
whether it is - it doesn't look like it, from the way I read the code.

> This seems pretty straight-forward, unless I've missed something.
>
> (note that the pristine store would have this "extra" pristine,
> unreferenced by any other table; that could get garbage-cleaned by a
> separate process... BUT: there is an admin lock present during a
> commit, so we'd simply avoid GC'ing the PRISTINE table/on-disk)

Yup, I'm happy that we can manage the GC properly, in one way or
another.

Thanks for the feedback.

- Julian
Received on 2010-04-21 12:29:18 CEST

This is an archived mail posted to the Subversion Dev mailing list.