[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Incomplete SVN dump files

From: Branko Čibej <brane_at_apache.org>
Date: Wed, 16 Sep 2015 20:28:24 +0200

On 16.09.2015 20:03, Eric Johnson wrote:
> Hi Bert,
>
> On Wed, Sep 16, 2015 at 2:33 AM, Bert Huijben <bert_at_qqmail.nl> wrote:
>
>>
>>> -----Original Message-----
>>> From: Andreas Mohr [mailto:andi_at_lisas.de]
>>> Sent: woensdag 16 september 2015 07:48
>>> To: Eric Johnson <eric_at_tibco.com>
>>> Cc: bert_at_qqmail.nl; users_at_subversion.apache.org
>>> Subject: Re: Incomplete SVN dump files
>>>
>>> Hi,
>>>
>>> On Tue, Sep 15, 2015 at 05:26:38PM -0700, Eric Johnson wrote:
>>>> I just checked, and there aren't any open bugs about this.
>>>> Interrupting svnrdump can result in a dump file with not all the
>> files of
>>>> the last commit in the dump record. Accidentally use that dump file
>> to
>>>> load into a new repository, and the resulting repository will not be
>> a
>>>> copy of the original.
>>>> My particular use case, I was trying to suck down a large
>> repository.
>>>> Connection interrupted part way through. I resumed from part way
>> through
>>>> (using the --incremental option) into an additional dump file. Then
>> did a
>>>> load of those two dump files. Did not yield a copy of the original
>>>> repository, though.
>>>> This seems like a critical issue for possible data loss when copying
>>>> repositories from machine to machine using svnrdump.
>>> AFAICS (not an svnrdump expert here) very well described and to the
>> point.
>>> You just managed to pinpoint a rather important serialization format
>>> that seemingly isn't fully properly atomically transaction-safe...
>>> (good catch!)
>> In some ways a dumpfile is a stream and not a file... and when you use the
>> commandline tools you always obtain it from stdout.
>>
>> I could argue that you in that case should check if the operation exited
>> successfully or with an error.
>>
> In my specific case, I'm trying to suck GB of data from Europe to the
> Western US. And apparently I cannot depend on the connect being stable long
> enough to last for the whole download.
>
> So if the dump of the last commit is incomplete, I an error code tells me,
> what, exactly? That I need to manually edit the stream that I just dumped
> into a file? That I should discard the whole dump, and start again?

If you don't have a stable connection, then you can mitigate that by
performing incremental dumps of one revision at a time and just retry
any that fail. You can even do that in parallel to amortize the cost of
opening the socket.

[...]

> SVN claims to be transactional with commits. Surely, svnadmin load can
> discard the last commit from a load if it was incomplete. Actually, doing
> anything else is just asking for occasional data corruption.

Commits, yes. Dump files, not so much.

> I'm filing an issue.

Good luck with that. Bert explained the reasons why dump files are the
way they are. An "end of commit" marker does not really add much value
compared to the other options you have, and has the really nasty side
effect that it breaks backwards compatibility of dump files.

-- Brane
Received on 2015-09-16 20:28:30 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.