----- Original Message -----
From: "Ben Collins-Sussman" <sussman@collab.net>
To: "David McBride" <dmcbride@languageweaver.com>
Cc: <users@subversion.tigris.org>
Sent: Friday, March 12, 2004 06:29
Subject: Re: binary file size limit? (4.2GB retry - success)
> > I guess most of the time was spent in the DB doing some "cleanup" work?
> > because the 4GB upload took just about 10 minutes, I could already
> > ls and checkout files, but it still took about 40 minutes until the
> > commit command finally completed.
>
> Let me explain what's going on here when you commit a file:
thanks for explaining the things going on in detail again.
> 1. copy the working file to .svn/tmp/, in case the user changes it
> during the commit. if eol or keyword translation is on, 'detranslate'
> these things when copying.
in my test case it was a binary file and no keywords on,
so this was a pure copy, nothing was modified.
so maybe skip this pass on binaries-without-props?
> 2. send a binary diff over the network by comparing the text-base of
> the file with the temporary file.
I hope It wasn't a 'real diff' because I just used add/commit,
so it just needs to be a 'full upload' of the WC file.
> 3. repository applies binary diff to the file as it receives it;
> after the new file is fully constructed & committed in the repository,
> compare the new file with the previous version and do another binary
> diff. Store the previous version of the file as a binary diff against
> its successor.
nothing to 'apply' here, because the file didn't exist in the repos
before in my test (i.e. no diff to apply, just a copy)
> 4. the client gets the new revision number from the server; it then
> copies the .svn/tmp/ file into .svn/text-base again.
here too, no need to copy the non-modified binary-file-without-props
back into the repos. it's already there.
maybe add an extra shell output that states that the commit
already completed and is now just copying files back into the WC?
there is no need to modify/copy back to the users wc file
if no keyword props are enabled and no line ending conversion
was neccessary during commit, the file is already there.
It's not even a good idea to 'touch' the file date if avoidable,
since this will force recompilation of the 'touched' files when
using a timestamp based make/build.
> Step #1 takes a very long time on a 4GB file.
>
> Step #2 sends only a teeny tiny binary diff over the network, but it
> takes a horribly long time to *generate* the diff... the binary diff
> algorithm needs to scan two 4GB files, comparing them byte-for-byte!
10 minutes for the initial add/commit for a 700MB file - looks ok to me.
> Step #3 takes just as long as step #2, because now the repository is
> *rederiving* the same binary diff in reverse!
>
> Step #4 takes just as long as step #1.
from my point of view, 4 can and -should- be skipped in most cases
( at least to prevent unneccessary rebuilds )
> We've gone over these things in the past, and had many discussions about
> how to optimize things. It would be nice if svn allowed users to
> optimize for either network or CPU: in other words, like the CVS '-z'
> flags, you could tell svn to not do any diffing at all in steps #2 and #3.
It would be nice if at least the situations where SVN performs much slower
than CVS did could be fixed.
:-)
c.a.t.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Mar 12 10:24:42 2004