On Wed, 2010-04-21, Julian Foad wrote:
> Greg Stein wrote:
> > On Wed, Apr 21, 2010 at 05:09, Philip Martin <philip.martin_at_wandisco.com> wrote:
> > > Julian Foad <julian.foad_at_wandisco.com> writes:
> > >
> > >> COMPATIBILITY
> > >> =============
> > >>
> > >> We need to keep the old WC interface working:
> > >>
> > >> svn_wc_transmit_text_deltas2(&tempfile, &md5_digest, ...)
> > >> svn_wc_queue_committed2(queue, path, ..., md5_checksum)
> > >> svn_wc_process_committed_queue(queue, ...)
> > >>
> > >> How? I can't see a way to communicate the SHA-1 checksum to
> > >> svn_wc_process_committed_queue() via the queue, but I can think of the
> > >> following ways.
> > >
> > > There is an access baton in the old interface, it's opaque and can be
> > > made to store anything. It could contain a hash of filename=>SHA-1.
> Thanks, Philip - I hadn't thought of that. I'll bear it in mind. It
> could be useful for other things too.
> > Presumably, transmit_text_deltas2 is a wrapper around deltas3. Thus,
> > deltas2 has the SHA1 value from that inner call.
> > Putting something into *TEMPFILE is optional, so we can skip that. The
> > MD5 result from deltas2 can be fetched from the PRISTINE table, given
> > the SHA1 key.
> Yup, transmit_text_deltas2() can easily know and return the MD-5.
> > queue_committed2 can use the MD5 value and key into PRISTINE (we
> > should have an index on PRISTINE.md5_checksum) to find the SHA1.
> I wondered about looking up the pristine text from its MD-5. Certainly
> possible (preferably via an index for speed).
> At first I had a slight concern about the remote possibility of MD-5
> collisions. I now think we can alleviate the concern by checking if a
> new pristine text ever has an MD-5 that's already recorded in the
> pristine store against a different SHA-1. If that ever happens, we can
> issue a warning or error, the resolution of which is "upgrade to 1.7+,
> which no longer relies on MD-5 uniqueness".
> The bit I'm not sure about is whether the MD-5 of *every* new text base
> in the commit is actually passed through the queue. I'll go and test
> whether it is - it doesn't look like it, from the way I read the code.
BTW I confirmed that, yesterday: it's true with the current code.
However, the APIs have been through several revisions, and the oldest
ones - svn_wc_process_committed() and svn_wc_process_committed2() -
don't even communicate the MD5 checksum on to the post-commit step.
In IRC discussion with Greg we decided a probable strategy for dealing
with the old APIs is to make them rely on a fixed, deterministic, tmp
path, like they always have done. Instead of the WC-1 scheme which gave
a path like
the WC-NG code will provide a path that's safe to be in a single .svn
dir at the root of a WC, such as
or maybe encoding <dir/somedir/foo> into a single path component if
that's better than dealing with arbitrary levels of subdirs.
> > This seems pretty straight-forward, unless I've missed something.
> > (note that the pristine store would have this "extra" pristine,
> > unreferenced by any other table; that could get garbage-cleaned by a
> > separate process... BUT: there is an admin lock present during a
> > commit, so we'd simply avoid GC'ing the PRISTINE table/on-disk)
> Yup, I'm happy that we can manage the GC properly, in one way or
> Thanks for the feedback.
> - Julian
Received on 2010-04-23 16:10:37 CEST