[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Standardised Repository Schema

From: Kim Lester <kim_at_dfusion.com.au>
Date: 2003-04-16 17:10:04 CEST


Repository "Schema" Thoughts:

I know this topic comes up from time to time on the group,
but I've done a reasonable search on the archives and
haven't seen someone say "This is the recommended schema"

I'd like to suggest that in the "Definitive Guide" we add
some more explicit details about repository layout guidelines.

I'd also like to suggest the recommended schema.
If you think there shouldn't be a recommended schema for
_coding use_ please at least read this first.

I'm happy to turn the relevant bits into full (and balanced viewpoint) prose
for the manual if the group agrees.

Whilst subversion (correctly) provides mechanism but not policy there is also
a risk in not giving "strong experience based guidelines".

* One commonly accepted reason X failed to take off for years was
that no guidelines were suggested, the Athena widget set was provided
as a demo set, but guidelines and matching libraries were non-existant
until roughly the time when motif came along with some standards
and everyone started pulling together standards either for or against motif.

*Same with Algol (!?) a reasonable algorithm language that decided
useful IO library routines like "print, read, write" etc were
'beneath it' to implement - it wanted to be 'pure'. It was so flippin'
pure that no-one could do anything useful with it - without having
to obtain one of several, incompatible IO libraries. It died.
</general-rant> :-)

One should never be afraid of giving strong style guidelines.

As well, subsequent tools (down the track) that work with subversion
databases would need a well defined "schema" for them to work
properly and be useful to many people.

One reason I'm bringing this up is that I work in an evironment
with MANY packages and projects, whereas I'm guessing the Subversion
developers are primarily dealing with (as has been mentioned several times)
only the one project (or maybe several totally independent projects)
in their repository and may not have fully experienced all the
detailed interdependencies that typically occur in such cases.

        a) many small repositories turn into bigger ones
        b) the schema doesn't really matter for single projects or 1-2 projects
                it matters greatly for more complex setups.

Put simply:
        * most simple-usage users are probably happy to "go with the recommended flow"
        * complex-usage users probably need to know a layout that works for most cases.

Therefore I'd suggest _one_ layout be "encouraged" unless someone has a good reason to
do something special.

* There are 2 common ways to set up a repository.

(Oft drawn picture:)


                /trunk/packageA/* (the current working revision of project or package A)

                                        branch-1/* (actual branch name/numbering is irrelevant)







Case A is most likely to be best where:
        multiple packages, projects etc in a repository
        automated repository management tools may be needed
        repository users are less experienced

Case B is suitable where the packages are totally independent
        projects that bear no relation to each other and possibly
        have separate administrators (is this strictly true ??)

IMHO for most users I'd propose CASE A is actually the best.
My 1st reaction on using Subversion was to go with CASE B
but I found too many general limitations.

My reason for encouraging CASE A is as follows:

        1) branches and tags are "project admin" level functions
                and keeping them well away from the working area reduces
                the likelyhood of accidents.

                It may also make admin access controls easier to implement
                since only 2 top level dirs need be locked down (true !?)
        2) A checkout of current state of all "packages" can be simply
                achieved by svn co /trunk/...
                Another minor benefit is that /trunk" could be made part
                of an $SVNROOT env var if needed and hence made transparent
                to users without creating needless wrapper scripts around
                SVN commands - no point insulating everything.
                The same can't be said when trunk comes at the end.
        3) Less experienced/able users (or new CVS users) are going to make
                the mistake (often enough) of doing an
                svn co /.../pkgname/

                "Disastrous" in case A... you end up with hard copies of every version ever...
                You might argue that people just have to learn and if this were
                the only issue I'd probably agree, but I feel it is one of several.

        4) CASE A better supports sub-package or sub-project dirs.
                Here subpackages have same general status as packages, we just
                drill down to a finer level of granuality without further thought.

                ie the path becomes /trunk/.../packageA/sub-packagedir/
                (quite straightforward, esp. for machine parsing)

                In CASE B we end up with
                where "trunk" - a repository management path component
                gets stuck at an essentially arbitrary lcoation within the

                Machine parsing/generation also gets a lot harder - eg waht to extract
                a package or subpackage - where to put the "trunk" - go figure ?
                it isn't obvious.

                Similarly mechanical branch and tag "management" becomes
                slighly simpler with case A.

        5) An export/dump/snapshot of the current state of all packages is presumably
                simpler in CASE A. One doesn't need to carefully iterate through
                all dirs looking for ones ending in trunk.

So *I* feel CASE A is the more general (and does no harm even if you only
have one project). Therefore I think it is the one that should become
the default, encouraged standard.

Another reason at the back of my mind for encouraging this "Standard"
is that many environments would benefit from more automated repository
management tools - eg Config Mgmt Tools, Workflow Tools etc.
These tools really need a well defined, consistent and easily automated
environment (eg repository schema) for them to work.

In theory the tools could work with CASE A or CASE B (no point arguing
over that) however they WON'T work with arbitrary user schemas C,D,E,F...
ie they need a standard, regardless of what it is.

So if you want to ecourage Subversion's use in these areas, a standardised
repository schema is necessary and whilst users can convert their repository
if they have to, why not say up front - DIY if you know your needs are different,
but otherwise use this method because it scales well etc.

Feedback appreciated. If I've got something obviously wrong I'd like
to be set straight - gently :-)


Kim Lester,
Senior Engineer,
Datafusion and Visualisation Systems


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Apr 16 17:09:45 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.