[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svnrdump: The BIG update

From: 'Daniel Shahaf' <d.s_at_daniel.shahaf.name>
Date: Thu, 19 Aug 2010 21:01:05 +0300

(sorry for the delay; didn't want to reply while sleepy)

Bert Huijben wrote on Tue, Aug 17, 2010 at 09:30:08 -0700:
>
>
> > -----Original Message-----
> > From: Ramkumar Ramachandra [mailto:artagnon_at_gmail.com]
> > Sent: dinsdag 17 augustus 2010 9:09
> > To: Daniel Shahaf
> > Cc: Subversion-dev Mailing List
> > Subject: Re: svnrdump: The BIG update
> >
> > Hi Daniel,
> >
> > Daniel Shahaf writes:
> > > Ramkumar Ramachandra wrote on Thu, Aug 12, 2010 at 12:17:34 +0530:
> > > > > > The dump functionality is also complete- thanks to Stefan's review
> > and
> > > > > > MANY others for cleaning it up. It's however hit a brick wall now
> > > > > > because of missing headers in the RA layer. Until I (or someone
> else)
> > > > > > figures out how to fix the RA layer, we can't do better than the
> XFail
> > > > > > copy-and-modify test I've committed.
> > > > >
> > > > > Part of the diff there is lack of SHA-1 headers --- which is
> unavoidable
> > > > > until editor is revved --- but part of it is a missing
> Text-copy-source-
> > md5.
> > > > > Why don't you output that information --- doesn't the editor give it
> to
> > you?
> > > >
> > > > Afaik, no. I don't see Text-copy-source-* anywhere in the RA
> > > > layer. Maybe I'm not looking hard enough?
> > > >
> > >
> > > Hmm. It seems you're right. So you might have to use two RA session in
> > > parallel...
> > >
> > > (and then, you might have to have the user authenticate twice?)
> >
> > Hm, I also have to find out if it's allowed. The commit_editor doesn't
> > allow it for instance. Besides, it's a very inelegant solution- I'd
> > rather fix the RA layer than do this.
>
> @Daniel, what would adding these adders add?
>
> The extra headers are for making it easier to detect corruptions by checking
> them along the transfer.
>
> If we are just doing additional work to add headers via a different process
> it slows the dumping down more than a bit and it doesn't make the dump file
> any safer because it uses a different processes to obtain the header.
> I think you would have to obtain the source of the copyfrom and get some
> checksum from that; maybe you can do that without transferring the file
> again, but I'm not sure about that.
>

I'm a bit surprised, but indeed I don't see a way to obtain the checksum
via svn_ra.h. (The word 'checksum' doesn't appear there, and it isn't
included in svn_dirent_t either.) I wonder how we got away without
having it...

> (And without the added headers the process is already as safe as svnsync.).
>
> Yes, we can add more and more processing to also get those new Sha1 headers
> by recalculating them while dumping, but the idea for svnrdump was to create
> a fast and secure way to dump and load repositories... not an incredible
> slow one that has to transfer files multiple times just to make all the
> optional headers match the output of svnadmin.
>
> Those headers were made optional for a reason: you don't always have them.
> And different conversion processes have different headers available.
> Svnadmin looks at the FS layer for dumping, so it sees different things than
> an RA layer api. E.g. the dump in svnadmin has to create diffs from
> fulltexts itself, while svnrdump has diffs and must apply these itself to
> get full texts. The checksums have a similar mangling. The FS has access to
> some of the checksums and recalculates others for you. (See the performance
> drop in 1.6 of svnadmin dump)
>

Okay, agreed. I assumed the editor would provide the copyfrom's
checksum for free (or, at least, that svn_ra_stat() would provide it),
but of course I won't suggest to add those copyfrom-checksum headers if
calculating them is as expensive as it now appears to be.

> There is a similar case at the import side. Applying commits can't check all
> the checksums, but the really important ones are already handled. Svnrdump
> dump and svnrdump load are a nice match.
>
> Bert
>

Thanks for doubting,

Daniel
Received on 2010-08-19 20:04:19 CEST

This is an archived mail posted to the Subversion Dev mailing list.