[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Tags and scalability (was Re: Enlightenment)

From: Kean Johnston <jkj_at_caldera.com>
Date: 2002-09-25 02:39:24 CEST

All,

Ok I read through everyone's response to my request for
enlightenment about the ever-increasing version number
for an entire tree. My gut instinct still tells me this
may lead to problems in the future, but for now I buy it.
However, one thing I don't buy as being efficient is the
way people suggest we do tags.

Although it may be a "relatively cheap" operation to copy
a directory, please consider the effect when subversion
is asked to maintain very large trees. Lets say there are
a quarter of a million files. That means at the very least,
assuming a single 4-byte integer is used for each file
as its "pointer", 1 megabyte (give or take a teeny bit)
per tag. If you want to make weekly, intra-weekly or
possibly even daily tags, this can get very expensive
very quickly.

How about this. Since there is always just a single version
that the tree is at at any given time, (lets say when I
make the tag its at version 3261). If I was to use the
yet-to-be-written "svn tag my_release_tag_name", it
could use a single database record in an SVN specific file
at the root of the tree that simply records the current
tree version. Thus if I ever check out my_release_tag_name
it knows I really mean release 3261. This then limits the
data required to store the tag to 4 bytes for the revision
number and however many bytes the symbolic tag name is.
This also *HAS* to be a quicker operation than directory
copying, no matter how fast a directory copy is.

My other concern is with the "hidden cached copies of
every file" scheme. For something the size of Apache,
and subversion, maybe even something meatier like X11,
that may be OK, but when your source tree is over 3G
in size, you now double that to 6G. That's a huge hit.
Can we at least open up a discussion about possibly
rethinking why the cached copy is needed? Is it THAT
important that you can revert a file on an aeroplane?
Wouldn't keeping a simple CRC or even MD5 hash of the
file to be able to *detect* changes suffice? Or at
least give the svn repository manager the option of
setting up his respository that way. Of course the
problem becomes bigger when someone in the military
decides to use subversion one day (aint that a pun?)
to manage their 40G ADA repositories.

I hope this is food for thought. I don't mean to be
a trouble maker :)

Kean.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Sep 25 02:55:22 2002

This is an archived mail posted to the Subversion Dev mailing list.