[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Incomplete SVN dump files

From: Eric Johnson <eric_at_tibco.com>
Date: Wed, 16 Sep 2015 11:03:08 -0700

Hi Bert,

On Wed, Sep 16, 2015 at 2:33 AM, Bert Huijben <bert_at_qqmail.nl> wrote:

> > -----Original Message-----
> > From: Andreas Mohr [mailto:andi_at_lisas.de]
> > Sent: woensdag 16 september 2015 07:48
> > To: Eric Johnson <eric_at_tibco.com>
> > Cc: bert_at_qqmail.nl; users_at_subversion.apache.org
> > Subject: Re: Incomplete SVN dump files
> >
> > Hi,
> >
> > On Tue, Sep 15, 2015 at 05:26:38PM -0700, Eric Johnson wrote:
> > > I just checked, and there aren't any open bugs about this.
> > > Interrupting svnrdump can result in a dump file with not all the
> files of
> > > the last commit in the dump record. Accidentally use that dump file
> to
> > > load into a new repository, and the resulting repository will not be
> a
> > > copy of the original.
> > > My particular use case, I was trying to suck down a large
> repository.
> > > Connection interrupted part way through. I resumed from part way
> through
> > > (using the --incremental option) into an additional dump file. Then
> did a
> > > load of those two dump files. Did not yield a copy of the original
> > > repository, though.
> > > This seems like a critical issue for possible data loss when copying
> > > repositories from machine to machine using svnrdump.
> >
> > AFAICS (not an svnrdump expert here) very well described and to the
> point.
> > You just managed to pinpoint a rather important serialization format
> > that seemingly isn't fully properly atomically transaction-safe...
> > (good catch!)
> In some ways a dumpfile is a stream and not a file... and when you use the
> commandline tools you always obtain it from stdout.
> I could argue that you in that case should check if the operation exited
> successfully or with an error.

In my specific case, I'm trying to suck GB of data from Europe to the
Western US. And apparently I cannot depend on the connect being stable long
enough to last for the whole download.

So if the dump of the last commit is incomplete, I an error code tells me,
what, exactly? That I need to manually edit the stream that I just dumped
into a file? That I should discard the whole dump, and start again?

> After an error you can't trust that the final portion is ok.

Sure, but why not encode that in the dump itself! The absence of an
"end-commit" trailer could be a signal to every tool that uses the dump
that the commit is not complete, and the transaction could be discarded!

> The stream was also deliberately designed in a way that you can
> incrementally generate it... E.g. after each new revision or as a daily
> backup operation.

> Adding some 'this is the end' marker would break those use cases, that we
> have been using since the day subversion was self-hosted. (Long before 1.0)
> Sounds like an argument for a "start commit / end commit" frame in the
dump. So if you want to support this use case, adding an "end-of-stream" at
the end of the stream wouldn't be sufficient. Right now, the dump file
apparently just has a "start commit" indicator. So it breaks everything.

> And when loading from a stream we can't continue reading to the end to see
> if there is a final marker, as at that point we aren't able to go back to
> the start and start the whole process.
> (I've used '$ svn dump .... | ssh .... svnadmin load ...' more than a few
> times for repository migrations)

SVN claims to be transactional with commits. Surely, svnadmin load can
discard the last commit from a load if it was incomplete. Actually, doing
anything else is just asking for occasional data corruption.

I'm filing an issue.

Received on 2015-09-16 20:03:42 CEST

This is an archived mail posted to the Subversion Users mailing list.