[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: svn commit: r988074 - in /subversion/trunk/subversion/tests/cmdline: svntest/wc.py upgrade_tests.py

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Tue, 24 Aug 2010 12:04:12 +0100

Bert Huijben wrote:
> Philip Martin wrote:
> > "Bert Huijben" <bert_at_vmoo.com> writes:
> >
> > >> * subversion/tests/cmdline/upgrade_tests.py
> > >> (text_base_path): Restore MD5 support removed in r960036.
> > >
> > > I think the real fix would be to upgrade to SHA1 (and add the
> > > mapping in the pristines table) in the upgrade step. I expected that
> > > this was already handled?
> >
> > Yes, that needs to happen, and no, it doesn't happen yet. The new
> > code stores SHA1 on checkout/update but the upgrade code simply copies
> > the MD5 and doesn't do MD5 to SHA1 conversion. I discussed this with
> > Julian on IRC yesterday, the plan is to remove the MD5 support
> > eventually.
> >
> > There are two cases to consider, upgrade from 1.6 to latest and
> > upgrade from older 1.7 to latest. For the older 1.7 upgrade we can
> > simply use the PRISTINE table to replace the MD5 with the
> > corresponding SHA1 in the bump_to_19 code.
> >
> > The 1.6 upgrade is a bit harder. We can do the text-base to pristine
> > before doing the entries file, so that the PRISTINE table is
> > available,

If, instead, we construct each the PRISTINE table entry at the point
where we're converting an entry from the entries file, then we can
calculate both checksums on the fly, and we can store both of them in
the new DB row(s). That's true even for those few pristines that don't
have any checksum in the 'entries' file.

Maybe that makes the code flow harder, but it sounds easier than
maintaining an intermediate store of checksums.

> but the table is not currently indexed on MD5. As there is
> > now only one table per wc it might be too slow if there are lots of
> > files. We may need an MD5 index, as part of PRISTINE or separate,
> > just for the duration of the upgrade.

*If* we were to use that method (but see below), and *if* it does turn
out to be too slow, then adding an index would be an easy change. I
don't think we need to hesitate from using MD5 look-ups on that account.

> The bump_to_19 code can do the
> > MD5 to SHA1 conversion before switching to single-db, the table is
> > smaller and may not need an MD5 index (and the bump_to_19 code simply
> > isn't as important as the 1.6 upgrade code).
>
> In the old entries format we only kept one checksum, while we can have two
> pristine files, so just keeping it as MD5 can't solve all the issues.
> But we can't just assume that we never see a collision with MD5 over an
> entire tree.. or we wouldn't have switched to SHA1 in the first place.

MD5 collisions during upgrading an existing WC? A remote possibility of
course, but yes, let's try to avoid that possibility. If MD5 look-up
was the only practical way forward, especially if it were per-directory,
then I wouldn't be too concerned about handling collisions gracefully
and think we would only need to detect them and bail out with an
apologetic message.

For upgrades from 1.7-dev versions, I think we should be happy to accept
the possibility of MD5 collisions.

> Maybe we should use a somewhat broader structure then just a single (or
> dual) svn_wc_entry_t to keep the state while upgrading. This can then
> contain things like the SHA1 checksums and other values that can't be stored
> in just the entries.

That sounds harder than transferring the pristines in-line, though I'm not sure.

- Julian
Received on 2010-08-24 13:04:55 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.