Re: ideas to make svn update faster.

From: Branko Čibej <brane_at_xbc.nu>
Date: 2005-05-07 22:45:33 CEST

Thomas Zander wrote:

>On Saturday 07 May 2005 20:47, Branko Čibej wrote:
>
>
>>Thomas Zander wrote:
>>
>>
>>>[2] Which
>>>I fully understand. Looking at strace output I notice that svn could be
>>>a lot faster (do less writes) if svn was to be more optimistic about
>>>version numbers.
>>>
>>>
>>With "more optimistic" == "wrong", unfortunately...
>>
>>
>
>No its not; don't dig your heels in the sand just yet; please. But, please
>tell me exactly what usecase I missed where things go wrong. Thanks.
>
>
I think I covered some ot that later in the reply, but in general, I
think you're making assumptions about WC usage patterns that aren't
valid. A simplistic approach to speeding up "svn update" could easily
slow down other operations.

>>>kdelibs has ~8800 files and 378 dirs. At any time maybe 10 files have a
>>>different version then the rest (hell; let it be 10%). That means that
>>>around 370 .svn/entries files have been written with the only change
>>>being a new version number in the name="" entry that is equal to just
>>>about all the other dirs in the project.
>>>A simple optimalisation would be to remove the directory-version number
>>>(the one in the xml entry-tag with 'name=""') when it has the same one
>>>as the parent dir.
>>>
>>>
>>Have you actually measured what percentage of update time it takes to
>>write those 378 entries files, or are you simply guessing that this is
>>the bottleneck?
>>
>>
>
>What? Don't you think the amount of writes is a problem, then?
>
I've learned long ago not to "think" or "guess" about performance
bottlenecks, but to measure them.

> The work done
>on each update _is_ huge for a project like KDE (where kdelibs is just a
>subdir; a normal update will easilly go to 200000 files).
>If you dd the profiling; thats fine. Lets work on that; if you didn't then
>what about working on this part, now, eh?
>Statting less files etc comes later.
>
>
Look, I didn't say your analysis was wrong, I asked if you'd actually
measured the performance. If you haven't then anything you say is pure
guesswork.

>>>Its probably not goint to be as simple as that (since you update subdirs
>>>seperately), but I'm pretty sure that a lot less xml's have to be
>>>written if you follow the route that the normal state is a dir having
>>>the same version as its parent. Only when that fails do you need to do
>>>extra work. Being optimistic about version changes; I'd call that.
>>>
>>>
>>Well, the first question that pops to mind is, how do you tell that the
>>equal-version assumption is wrong, unless you record the dir's version
>>number?
>>
>>
>Sure you record it; but only for the dirs/files that actually have a
>different version number. (and svn already does that partly)
>Don't think so black and white, here.
>As I said; you read the entries files as normal, but you don't have to
>overwrite them for each dir if only the global version changed. Since the
>resulting xml would be exactly the same.
>
>
Whatever you'd gain during update by not recording the new directory
revision (on the assumption that it's the same as the old one), you'd
lose because your working copy would have a greater mix of revision
numbers, which means that the tree report sent to the server before the
commit would be larger. Exactly what this gain/loss ratio would be, I
wouldn't venture to guess, but I'm pretty sure it doesn't scale linearly
with WC size.

>>>Now; there is probably going to be a lot of opinions on the above
>>>subject; and I'd like to point out that svn really needs speed
>>>optimalisations; I have seen a LOT of complaints about this issue in
>>>the KDE switchover. Remember that if you find the above suggestion
>>>technically less-then-ideal.
>>>
>>>
>>Certainly SVN needs speed optimizations. But I think you're approaching
>>them exactly the wrong way around. The thing to do is to measure where
>>the bottlenecks are, and strace is far from enough for that.
>>
>>
>
>Hmm. I'm afraid its not really a secret recipy that if your process is not
>taking a lot of cpu and memory, but is reading and writing a lot of files;
>then the first thing to look into is to get it to write less files since
>writing files is _always_ the slow part of disk access.
>
>
Most of the time, yes, but disk access isn't the only slow part of an
update.

>But, if you did the profiling part; I'd be happy to compare notes! :)
>
>
Oh no -- that's your job, part of the task of convincing us you're right. :)

>>>The strace also showed me things like;
>>>* the .svn/format file is opened 5 times for each directory.
>>>
>>>
>>We know about that, and we already have a (tentative) plan to remove the
>>format file and put the format information into the entries file.
>>
>>
>
>Sounds great; good to hear I'm not smoking crack then :)
>
>
>
>>>I would think
>>>that with auto-upgrades only one (the root dir) should be enough.
>>>
>>>
>>That, of course, is again an oversimplification. You can't make
>>assumptions about the state of subdirectories in the working copy.
>>
>>
>
>You can only make assumtions if you wrote the things; you make assumtions on
>the format of the entries file (and other things) for the plain and simple
>reason that svn wrote the file.
>So if the upgrading routine of the format of the .svn dir makes sure he
>actually _knows_ about the format file afterwards; then yes you can make
>assumtions.
>
>
The catch is that you can't make assumptions about the order of working
copy directory accesses. It's entirely possible to have a path A/B/C,
where A anc C are at version N, and B is at N+1.

>There are lots of ways to do this; if you find an old version in a parent
>dir you upgrade it and upgrade all child dirs (which are listed in each
>entries file) at the same time; and only when everything went fine you note
>that in the format file. With this approuch only one .svn/format needs to
>be read.
>
>
How do you find an old version, except by reading he format (or
equivalent) file?

>>>* .svn/lock files being created in every subdir is not needed if you
>>>check parent dirs that also have a .svn (and maybe the same root).
>>>
>>>
>>What you think of as the "root" of the working copy is a figment of the
>>imagination. It's quite valid to have two SVN processes fiddle in
>>parallel with two subtrees in the WC. A third SVN working from a common
>>root of those two subtrees could zap the WC if it didn't try to lock it
>>recursively first.
>>
>>
>
>If I type update in foo/bar then the root is bar. If I type update in
>foo/bar/baz; then the root is baz. Simple because thats already what you
>do now.
>
>
But what if you type "svn update foo/bar & svn update foo/bar/baz/qux"?
If you only create the lock file in the roots, the two updates will
likely interfere with each other somewhere along the line. (And please
note that, while this looks like nonsense from the command-line client's
point of view, a file-manager-like GUI could have other ideas.)

>The only difference being that you create a whole lot less lock files.
>Your example;
>consider
>a/b/c
>a/b/d
>One svn is updating c, another is updating d. Effect; one lock file in
>c/.svn and another in d/.svn
>Then the user types svn update in 'a'.
>Effect now; svn: Working copy 'a/b/c' locked
>Effect with my proposed change; well, none actually, it again gives the same
>problem.
>
>
Except of course for the potential race conditions, which can zap your
working copy.

>The fact that svn could just skip that dir in the update and only print a
>warning is another point. But I won't go there just now.
>
>
>
>>>So you create one in the dir you typed 'svn up' in and if someone types
>>>svn up in a subdir it will change dir to parent and check for a lock
>>>file until it either finds it (in this case it will, and abort) or it
>>>will leave the checkout.
>>>This will save a _lot_ of file-creation and removal afterwards.
>>>
>>>
>>So, you're saying that we should check locks upwards in the working
>>copy, not downwards. Interesting idea. I'd not want to guess what
>>happens if you have symlinked working copies.
>>
>>
>This is the opposite effect of the situation we described above. Same dirs
>a/b/c. Only this time the first svn is in the dir 'a'. And while thats
>running I start one in thesubdir 'c'.
>You expect it to bail out, as it does right now.
>So; read my explenation and see how that will do exactly that.
>Symlinks are a non issue since svn doesn't follow them anyway in an update.
>
>
-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat May 7 22:46:09 2005

This message: [ Message body ]
Next message: kfogel_at_collab.net: "Re: [neon] Response header fetching in Neon 0.25.0."
Previous message: Philip Martin: "Re: ideas to make svn update faster."
In reply to: Thomas Zander: "Re: ideas to make svn update faster."
Next in thread: Thomas Zander: "Re: ideas to make svn update faster."
Reply: Thomas Zander: "Re: ideas to make svn update faster."

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]