[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: binary file size limit? (4.2GB retry - success)

From: Ben Collins-Sussman <sussman_at_collab.net>
Date: 2004-03-12 15:20:12 CET

C.A.T.Magic wrote:

>> 1. copy the working file to .svn/tmp/, in case the user changes it
>>during the commit. if eol or keyword translation is on, 'detranslate'
>>these things when copying.
>
>
> in my test case it was a binary file and no keywords on,
> so this was a pure copy, nothing was modified.
> so maybe skip this pass on binaries-without-props?

No, read what I wrote: what if the user changes the file while you're
streaming it over the network? We *cannot* simply push the working file
over the network. We need to push a copy of it that is out of the
user's sight. The translation stuff is just a sometimes thing; the
copy needs to be made regardless. We're either going to do a straight
copy to .svn/tmp, or a 'translated' copy. But the copy has to happen.

>
>> 2. send a binary diff over the network by comparing the text-base of
>>the file with the temporary file.
>
>
> I hope It wasn't a 'real diff' because I just used add/commit,
> so it just needs to be a 'full upload' of the WC file.

We're still sending binary diff data, even in the case of adding a new
file. The binary diff algorithm has 3 "opcodes":

     - "add some new bytes"
     - "copy some bytes from source file to target file"
     - "copy some bytes from target file to target file"

All binary diffs are expressed in terms of these opcode-instructions.
In the case of adding a new file to the repository, the client is
effectively comparing an empty file as the "source" file against the new
file (the "target"). The result is that the client only sends opcodes
of the first type.

But the main problem here is that we're still running the binary diff
algorithm over the whole file, even when adding it for the first time.
The room for improvement here is to not waste CPU time on this, and be
able to stream raw text to the server. At the moment, our API doesn't
allow this. That's the fundamental problem. The server will *only*
accept binary diff data for file contents, not raw text. We've been
talking about changing this API for quite a long time; there's an
ancient issue filed about it. I'm just explaining the situation here.

>> 3. repository applies binary diff to the file as it receives it;
>>after the new file is fully constructed & committed in the repository,
>>compare the new file with the previous version and do another binary
>>diff. Store the previous version of the file as a binary diff against
>>its successor.
>
>
> nothing to 'apply' here, because the file didn't exist in the repos
> before in my test (i.e. no diff to apply, just a copy)

Nope, on the server end, it needs to take the stream of binary diff "add
new text" opcodes and use them to construct the new file. In this
particular case, there's no predecessor file to compress against the
latest file, so we're skipping that last bit of work.

>> 4. the client gets the new revision number from the server; it then
>>copies the .svn/tmp/ file into .svn/text-base again.
>
>
> here too, no need to copy the non-modified binary-file-without-props
> back into the repos. it's already there.

I think you mean, "copy back into the working copy"?

Anyway, the thing that got committed is now the new .svn/text-base/. It
needs to get into .svn/text-base/ somehow. This is not an optional thing.

> maybe add an extra shell output that states that the commit
> already completed and is now just copying files back into the WC?

Absolutely... we need better UI feedback. Even just an abstract
progress meter would help. We've been wishing for this forever.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Mar 12 15:21:32 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.