Hi Brane,
>> On Sep 16, 2015, at 11:28 AM, Branko Čibej <brane_at_apache.org> wrote:
>>
>> On 16.09.2015 20:03, Eric Johnson wrote:
>> Hi Bert,
>>
>>>> On Wed, Sep 16, 2015 at 2:33 AM, Bert Huijben <bert_at_qqmail.nl> wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Andreas Mohr [mailto:andi_at_lisas.de]
>>>> Sent: woensdag 16 september 2015 07:48
>>>> To: Eric Johnson <eric_at_tibco.com>
>>>> Cc: bert_at_qqmail.nl; users_at_subversion.apache.org
>>>> Subject: Re: Incomplete SVN dump files
>>>>
>>>> Hi,
>>>>
>>>>> On Tue, Sep 15, 2015 at 05:26:38PM -0700, Eric Johnson wrote:
>>>>> I just checked, and there aren't any open bugs about this.
>>>>> Interrupting svnrdump can result in a dump file with not all the
>>> files of
>>>>> the last commit in the dump record. Accidentally use that dump file
>>> to
>>>>> load into a new repository, and the resulting repository will not be
>>> a
>>>>> copy of the original.
>>>>> My particular use case, I was trying to suck down a large
>>> repository.
>>>>> Connection interrupted part way through. I resumed from part way
>>> through
>>>>> (using the --incremental option) into an additional dump file. Then
>>> did a
>>>>> load of those two dump files. Did not yield a copy of the original
>>>>> repository, though.
>>>>> This seems like a critical issue for possible data loss when copying
>>>>> repositories from machine to machine using svnrdump.
>>>> AFAICS (not an svnrdump expert here) very well described and to the
>>> point.
>>>> You just managed to pinpoint a rather important serialization format
>>>> that seemingly isn't fully properly atomically transaction-safe...
>>>> (good catch!)
>>> In some ways a dumpfile is a stream and not a file... and when you use the
>>> commandline tools you always obtain it from stdout.
>>>
>>> I could argue that you in that case should check if the operation exited
>>> successfully or with an error.
>> In my specific case, I'm trying to suck GB of data from Europe to the
>> Western US. And apparently I cannot depend on the connect being stable long
>> enough to last for the whole download.
>>
>> So if the dump of the last commit is incomplete, I an error code tells me,
>> what, exactly? That I need to manually edit the stream that I just dumped
>> into a file? That I should discard the whole dump, and start again?
>
> If you don't have a stable connection, then you can mitigate that by
> performing incremental dumps of one revision at a time and just retry
> any that fail. You can even do that in parallel to amortize the cost of
> opening the socket.
Yes, I can do that. Probably going to do that in chunks, otherwise it
has the same awful performance profile as svnsync over a low latency
connection.
>
> [...]
>
>> SVN claims to be transactional with commits. Surely, svnadmin load can
>> discard the last commit from a load if it was incomplete. Actually, doing
>> anything else is just asking for occasional data corruption.
>
> Commits, yes. Dump files, not so much.
A dump file itself isn't an issue. That I cannot safely pipe output
from svnrdump into svnadmin load is a *huge* problem.
>
>
>> I'm filing an issue.
>
> Good luck with that. Bert explained the reasons why dump files are the
> way they are. An "end of commit" marker does not really add much value
> compared to the other options you have, and has the really nasty side
> effect that it breaks backwards compatibility of dump files.
The end-commit marker in the dump stream is, of course just an
implementation choice. The bug is not being able to pipe the output of
svnrdump into svnadmin load.
As for backwards compatibility, the dump stream has already changed
with the deltas option to svnadmin, so an equivalent approach for
backwards compatibility could work for this issue as well.
Eric
Received on 2015-09-16 21:47:40 CEST