[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Trival merge of big text file: Dismal performance, 540x faster if binary.

From: Andreas Krüger, DV-RATIO <andreas.krueger_at_hp.com>
Date: Thu, 13 Jan 2011 11:49:55 +0000

Hello,

trivial merges of big text file changes are dismally slow. SVN can do
much better when doing such merges as binary.

Briefly, I think it should. I suggest SVN should detect the trivial
merge situation, and use the fast binary algorithm even for text
files.

I'd like to open a bug report / improvement suggestion on this.

What do folks think?


Here are the gory details:

This starts with some branch F and a big text file F/b.xml (see end of
message for details on "big"). This file has no SVN properties
whatsoever.

This got copied, with "svn cp", to some new branch T/b.xml.

Then a major overhaul of F/b.xml was checked in.

There had been no change in T/b.xml yet. So merging the overhaul
transaction from F to T is a *trivial* merge. As the result of that
merge, the T/b.xml content should be simply replaced with the content
of the overhauled F/b.xml.

That merge indeed worked as expected. Only it took 55:21 minutes on my
machine. During most of that time, there was very little network or
hard drive activity, but one CPU was kept 100% busy.


I found a way to speed this up considerably, by a factor of 540 in
this particular case, from 55 minutes to 6 seconds: Use binary instead
of text.

Gory details of this:

New F, new F/b.xml, with same content as before.

I lied to SVN and told it F/b.xml isn't a text file, but binary,
(setting svn:mime-type to application/octet-stream on F/b.xml).

After this, again svn cp to (a new T's) T/b.xml, and again the same
overhaul to F/b.xml .

The whole time, I was careful to not tell SVN there was any connection
to the previous experiment. In particular, no svn cp from the previous
experiment, but fresh checkin from workspace.

Again, the overhaul's merge from F/b.xml to T/b.xml resulted in
replacing the old T/b.xml content with the present F/b.xml content as
expected. Only this time, the merge took a mere 6 something seconds
instead of 55,3 minutes, resulting in a factor 540 speed improvement.

I want to have that speed improvement, without needing to lie to SVN!

Regards,
and thanks to the SVN project members for providing fine software,

Andreas

P.S.:

Numbers, in case someone cares:

The original F/b.xml was 18,291,344 byte and 456,951 lines.

The output of svn diff after the overhaul contained 676,136 lines,
(and that svn diff took quite a while to complete, which is
understandable and not part of this issue).

The overhauled F/b.xml was 18,311,873 byte and 688,560 lines.

I had similar performance problem experiences with various SVN
clients. The times quoted above were Cygwin's svn command line 1.6.12
on Windows. Protocol used was HTTPS, server Apache HTTPD with svn
module (also 1.6.12).

--
Dr. Andreas Krüger, Senior Developer

Tel. (+49) (211) 280 69-1132
andreas.krueger_at_hp.com

DV-RATIO NORDWEST GmbH, Habsburgerstraße 12, 40547 Düsseldorf, Germany
 
für
 
Hewlett-Packard GmbH H Herrenberger Str. 140 71034 Böblingen www.hp.com/de
Geschäftsführer: Volker Smid (Vorsitzender), Michael Eberhardt, Thorsten Herrmann,
Martin Kinne, Heiko Meyer, Ernst Reichart, Rainer Sterk
Vorsitzender des Aufsichtsrates: Jörg Menno Harms
Sitz der Gesellschaft: Böblingen S Amtsgericht Stuttgart HRB 244081 WEEE-Reg.-Nr. DE 30409072

Received on 2011-01-13 12:51:51 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.