> -----Original Message-----
> From: Julian Foad [mailto:julian.foad_at_wandisco.com]
> Sent: dinsdag 24 augustus 2010 13:04
> To: Bert Huijben
> Cc: 'Philip Martin'; dev_at_subversion.apache.org
> Subject: RE: svn commit: r988074 - in
> /subversion/trunk/subversion/tests/cmdline: svntest/wc.py
> upgrade_tests.py
>
> Bert Huijben wrote:
> > Philip Martin wrote:
> > > "Bert Huijben" <bert_at_vmoo.com> writes:
> > >
> > > >> * subversion/tests/cmdline/upgrade_tests.py
> > > >> (text_base_path): Restore MD5 support removed in r960036.
> > > >
> > > > I think the real fix would be to upgrade to SHA1 (and add the
> > > > mapping in the pristines table) in the upgrade step. I expected that
> > > > this was already handled?
> > >
> > > Yes, that needs to happen, and no, it doesn't happen yet. The new
> > > code stores SHA1 on checkout/update but the upgrade code simply
> copies
> > > the MD5 and doesn't do MD5 to SHA1 conversion. I discussed this with
> > > Julian on IRC yesterday, the plan is to remove the MD5 support
> > > eventually.
> > >
> > > There are two cases to consider, upgrade from 1.6 to latest and
> > > upgrade from older 1.7 to latest. For the older 1.7 upgrade we can
> > > simply use the PRISTINE table to replace the MD5 with the
> > > corresponding SHA1 in the bump_to_19 code.
> > >
> > > The 1.6 upgrade is a bit harder. We can do the text-base to pristine
> > > before doing the entries file, so that the PRISTINE table is
> > > available,
>
> If, instead, we construct each the PRISTINE table entry at the point
> where we're converting an entry from the entries file, then we can
> calculate both checksums on the fly, and we can store both of them in
> the new DB row(s). That's true even for those few pristines that don't
> have any checksum in the 'entries' file.
1.0.0 working copies have no checksums at all if I remembered correctly and we certainly have to upgrade those WCs. Same recipe for all files with a revert base.
> Maybe that makes the code flow harder, but it sounds easier than
> maintaining an intermediate store of checksums.
>
> > but the table is not currently indexed on MD5. As there is
> > > now only one table per wc it might be too slow if there are lots of
> > > files. We may need an MD5 index, as part of PRISTINE or separate,
> > > just for the duration of the upgrade.
>
> *If* we were to use that method (but see below), and *if* it does turn
> out to be too slow, then adding an index would be an easy change. I
> don't think we need to hesitate from using MD5 look-ups on that account.
>
> > The bump_to_19 code can do the
> > > MD5 to SHA1 conversion before switching to single-db, the table is
> > > smaller and may not need an MD5 index (and the bump_to_19 code
> simply
> > > isn't as important as the 1.6 upgrade code).
> >
> > In the old entries format we only kept one checksum, while we can have
> two
> > pristine files, so just keeping it as MD5 can't solve all the issues.
> > But we can't just assume that we never see a collision with MD5 over an
> > entire tree.. or we wouldn't have switched to SHA1 in the first place.
>
> MD5 collisions during upgrading an existing WC? A remote possibility of
> course, but yes, let's try to avoid that possibility. If MD5 look-up
> was the only practical way forward, especially if it were per-directory,
> then I wouldn't be too concerned about handling collisions gracefully
> and think we would only need to detect them and bail out with an
> apologetic message.
>
> For upgrades from 1.7-dev versions, I think we should be happy to accept
> the possibility of MD5 collisions.
For dev versions no problem, but from upgrades below from format 12 (=last entries files version) or older we should/must do the right thing.
(See the other mail: Just make the intermediate versions use the python script. These users knew that this was an option when they started using trunk)
Bert
Received on 2010-08-24 14:24:47 CEST