[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Generating a dump file using a powershell script

From: Geoff Worboys <geoff_at_telesiscomputing.com.au>
Date: Wed, 23 Jun 2010 11:12:35 +1000

Daniel Shahaf wrote:
> i.e., you import the files in order of their timestamps, so
> that svn:date remain globally sorted?

> Nice!

Yes, I thought so. :-)

> i.e., 'svnadmin dump' produces CRLF for svn:eol-style=native
> files? That surprises me; I'd expect such files to be
> outputted with LF in dump files. (My testing agrees with my
> expectation.) Can you double-check?

> In any case, it probably *should* use LF, since dumpfiles are
> supposed to be a portable binary format.

I think you are correct. I have an odd mix of svn repositories
here, some created by cvs2svn and some created directly by
various versions of svn ... and a few now created from script.

I do have a repository (originally created from cvs2svn) that
does dump files with property svn:eol-style=native but that
output with CRLF in the dump files. Suspect something went
astray there. I have vague memories of playing with the dump
files back when I created this repository so it may be a
problem that I caused ... or not.

It does appear that svnadmin accepts the dump file as the
literal truth - with minimal validation. For example I had
originally tried using ISO8601 timestamps on my files, eg:
  2010-10-31T12:34:56+10:00
and svnadmin load built the repository but svn itself ends up
complaining about bogus dates. Luckily the script was easy
enough to change over to UTC timestamps.

The strange thing, to me, was that while svnadmin load did
not "correct" the line endings when it loaded the file nor
did svn seem to corrupt the file when checking out. (I had
thought it might create files with CRCRLF or some such.)
That is not a complaint BTW ;-)

>> Can anyone explain this? A bug or am I missing something?
>>

> What's the question? Are you saying the code/comment disagree?

Yes they disagree. The question is: Which is right? (or Which
was the original intention?)

I see Bert/Julian have moved that part of the post to the dev
list but I have not subscribed there at this time. I am
content to leave the decision on how to handle with the devs,
I just wanted my script to be consistent with svn and wanted
it to automatically identify binary files distinct from text.

I imagine the svn code wants to accept some "binary" bytes in
order to see utf8 files as text ... but never having analysed
the distribution properties of utf8 I could not guess what
would be likely to work best - but do know the >0x7F should be
analysed separately to the other control characters. [If utf8
is not required then I would imagine that any "binary" at all
would indicate the file is not a text file.]

> Internally the function it uses is svn_hash_write2(), and
> there's a small documentation comment at the top of hash.c.
> But, as you say,

>> (Obviously I've gotten by just by visually checking dump
>> files produced by svnadmin, but it would be good to know
>> what I was doing. ;-)
>>

> the format isn't hard to reverse-engineer, right?

Not difficult ... but there are some subtleties in regard to
whether (and which) new-line characters are part of certain
data counts and trying to make sure my code attaches various
delimiting new-lines to the correct blocks of output ... etc.
If it was purely a text file then many things would be more
obvious but being a mix of binary and text the use of \n
delimiters needs to be careful (and should be explicit).

Thanks for your response, most appreciated.

-- 
Geoff Worboys
Telesis Computing
Received on 2010-06-23 03:13:29 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.