[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: binary file size limit? (700MB retry - VERY annoying svn issues)

From: C.A.T.Magic <c.a.t.magic_at_gmx.at>
Date: 2004-03-12 02:41:37 CET

----- Original Message -----
From: "David McBride" <dmcbride@languageweaver.com>
>I'm trying to understand the differences in response times for adds vs.
>commits. I understand that there is a large penalty when adding large
>files, but what is the response time for committing relatively small
>(e.g., < 20%) to that file? For example, I think that you said that it
>10 minutes to add your 700 MB file. How long does it take to commit a
>change to that file?
>The reason I ask is that svn commit times are supposed to be relative to
>size of the change and NOT the size of the file. Therefore, committing a
>small change should take only a fraction of the 10 minutes it took to add
>the file in the first place. Thanks again.

ok, I'm trying to do a few more tests with the 700MB file for you.

WinXP, svn commandline 1.0.0,
222G Raid 0 ( but this time the test-repos
is on my slowest, fragmented, outer 50G partition )
1 GB mem, 3.2 GHz cpu, dos commandline.

( this is cutNpaste output from some test scripts
- the dos time format output is H:MM:SS.xx )

# using 100% fresh repos
X:\SVNSandbox>svnadmin create file:///X:/SVNSandbox/Repos
# [...create Work and put a ~700MB (790,582,464) file in...]

X:\SVNSandbox>echo %time%
X:\SVNSandbox>svn add Work/huge700.bin
A (bin) Work\huge700.bin
X:\SVNSandbox>svn commit Work/huge700.bin -m "initial add huge"
Adding (bin) Work\huge700.bin
Transmitting file data .
Committed revision 1.
X:\SVNSandbox>echo %time%

~10 minutes total.
Again, there is almost nothing (6% cpu, no disk activity, 7MB mem)
going on in the last 5 minutes. You can already browse the
repos, see "huge700.bin" there and its even possible to checkout the file,
while the 'svn commit' is still printing its
 "Transmitting file data ." (I tested this during a second testrun,
so the timings above are not affected)

-> svn dev@ team: it was even possible to just "kill -9"
     the svn commit at this point, and nothing was lost --
     what's going on in those 5 minutes 'after' the 'real' commit?

ok, continuing after the first "commit" (from backup;-)

now I just modify (touch) the file time,
so svn has to re-check the file for modifications!

X:\SVNSandbox\Work> touch huge700.bin
*windows explorer booooom*
if you have Tortoise SVN installed and Icon overlay enabled
your internet explorer will go down --
TortoiseSVN is calling 'svn status' for every file in that folder,
and svn status is trying to verify the files MD5 value with its
entries file. this takes about 2 minutes. but after the two minutes,
the process starts over again and again(!),
because Explorer/TSVN tries to update the view
periodically -> this totally locks up the Desktop/Explorer.
I'll deactivate icon overlay (and recommend TSVN to exclude
large files... )

-> svn dev@ team: optimization request:
     if you already -recognize-, that a file is only
     'touched' and its MD5 still is the same, why is the timestamp
     in the entries file not updatet? but instead the MD5 is recalculated
     on -every- status/commit/update access to the file !

-> svn dev@ team: feature idea:
     if you want to help the TSVN developers (and also the cmdline)
     maybe allow a 'timeout' option or a 'size limit' on the 'svn status'
checks -
     if you can return the response fast (e.g. file size change),
     return it, if it will take longer only return it up to a certain
     CVS did not do MD5 verification - thats its plus in such a case.

ok, continuing after kill/restart explorer.exe ;-)

now for the timings of a 'svn status' when the file timestamp has
been 'touched' (not modified)

X:\SVNSandbox\Work>echo %time%
X:\SVNSandbox\Work>svn status
            #nothing printed
X:\SVNSandbox\Work>echo %time%

ok, you can see the same 2 minute lockup
(MD5 check/file compare actually) that happened to TSVN
on the console too.
status doesnt print anything because the file isnt modified.
but (as stated above) from now on, every svn status will take +2 minutes.
(Note: you can also see the same 'delay' summing up when touching
many smaller files, so this -is- a performance issue)

ok, lets see how we can get rid of that 2 minute delay...

>svn commit
At revision 1.

woho, svn is 'too' smart -- this command completes immediatly because
svn detects that the local version has the same revision than the
it doesn't matter that the locally file is modified.
so this doesn't help the 2 minute delay...

ok, lets try again and see how we can get rid of that 2 minute delay...

# time 1:22:00.36
X:\SVNSandbox\Work> svn commit
# nothing printed
# time 1:24:10.26

earlier, svn status took 2 minutes and printed that nothing has changed --
now svn commit takes another 2 minutes.

ok: after the -commit- all operations work fast again,
so svn must have updated its internal timestamp.

-> svn dev@ team: optimization request:
     like with CVS, an "update" must put the modified timestamp into
     if the file did not change. otherways all touched files (e.g.
     script generated files) sum up quite bad.

now, test a very simple real modification, like appending to a logfile:
 echo "add a few words" >>huge700.bin
( filesize changed to 790,582,480 )

ok, this time svn is smart again and detects the change in size:
 X:\SVNSandbox\Work>svn status
    M huge700.bin
as an immediate response.

lets diff...

X:\SVNSandbox\Work>svn diff
Index: huge700.bin
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream

hmmm nothing to say here --
not even a few details, e.g. that the filesize changed.

ok, I'll skip the diff for now. maybe fiddle with the
auto props later...

lets commit...

( the repos hasnt changed, so this doesnt need an update before
the commit, hence, no 'merge', but a diff to the textbase should occour. )

%time% 1:49:52.50
X:\SVNSandbox\Work>svn commit -m "appended text"
Sending huge700.bin
Transmitting file data .

uh oh.. I don't like what I see:
Repos/db/strings is currently growing to more than double its size (!!!)
what about the "supports binary diffs" ???
its sending the whole file content up to the database, taking
again ~11 minutes to complete.
db/strings is now at 1,602,199,552 bytes - ~doubled.

Committed revision 2.
%time% 2:11:06.62

I also verified the file contents again (using another binary diff utility)
to be 100% sure the files didn't really differ - they are 100%
identical except for the last 16 bytes added.

-> svn dev@ team: binary-diff bug with large files!!? - I just appended 16


I have to stop my experiments now 'cause it's late.
And probably further investigation is of no use,
because of the detected issues :-|
A also can't to further 'huge' tests until March 23 because
I'm 'on the road with my laptop'.

svn works just fine in the common cases, but I hope
my 'large file test' will give new ideas for optimizations
and result in some bugfixes.
currently I would not recommend it for files >100MB.

P.S.: should I cc this to dev?
hmm, it's not "HACKING" style formatted,
so better leave it on users@ :-)

To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Mar 12 02:41:59 2004

This is an archived mail posted to the Subversion Users mailing list.