[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Tags and scalability (was Re: Enlightenment)

From: Branko Čibej <brane_at_xbc.nu>
Date: 2002-09-25 03:12:59 CEST

Kean Johnston wrote:

>All,
>
>Ok I read through everyone's response to my request for
>enlightenment about the ever-increasing version number
>for an entire tree. My gut instinct still tells me this
>may lead to problems in the future, but for now I buy it.
>However, one thing I don't buy as being efficient is the
>way people suggest we do tags.
>
>Although it may be a "relatively cheap" operation to copy
>a directory, please consider the effect when subversion
>is asked to maintain very large trees. Lets say there are
>a quarter of a million files. That means at the very least,
>assuming a single 4-byte integer is used for each file
>as its "pointer", 1 megabyte (give or take a teeny bit)
>per tag. If you want to make weekly, intra-weekly or
>possibly even daily tags, this can get very expensive
>very quickly.
>
On the server, copies are O(1) time and space. That means that any copy
takes the same amount of space regardless of the size of the tree.

>How about this. Since there is always just a single version
>that the tree is at at any given time, (lets say when I
>make the tag its at version 3261). If I was to use the
>yet-to-be-written "svn tag my_release_tag_name", it
>could use a single database record in an SVN specific file
>at the root of the tree that simply records the current
>tree version. Thus if I ever check out my_release_tag_name
>it knows I really mean release 3261. This then limits the
>data required to store the tag to 4 bytes for the revision
>number and however many bytes the symbolic tag name is.
>This also *HAS* to be a quicker operation than directory
>copying, no matter how fast a directory copy is.
>
It's quicker only by a constant factor, the time complexity is the same
-- i.e., constant.

>My other concern is with the "hidden cached copies of
>every file" scheme. For something the size of Apache,
>and subversion, maybe even something meatier like X11,
>that may be OK, but when your source tree is over 3G
>in size, you now double that to 6G. That's a huge hit.
>Can we at least open up a discussion about possibly
>rethinking why the cached copy is needed? Is it THAT
>important that you can revert a file on an aeroplane?
>Wouldn't keeping a simple CRC or even MD5 hash of the
>file to be able to *detect* changes suffice? Or at
>least give the svn repository manager the option of
>setting up his respository that way. Of course the
>problem becomes bigger when someone in the military
>decides to use subversion one day (aint that a pun?)
>to manage their 40G ADA repositories.
>
That, of course, it a different issue altogether. And it's a
client-side, not a server-side issue. We're well aware of this problem,
although the solution for now is "disk is cheap" :-) The _real_ solution
we've been talking about on and off is almost exactly what you propose.
Welcome to the club! :-)

>I hope this is food for thought. I don't mean to be
>a trouble maker :)
>
>
Not at all.

-- 
Brane Čibej   <brane_at_xbc.nu>   http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Sep 25 03:13:40 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.