Eric's message concerning the lack of discussion around architecture has
prompted me to post concerning some ideas that I think are important for
Subversion. I've been holding off on posting these ideas because I didn't
want to interrupt the flow of work that has been going on, but I think he
is right in that there are a few things that need to at least be reviewed
while the work continues.
I apologize to everyone for hashing over ideas that may have been done to
death, but I am a little late for the party and missed much of what has
gone before. If nothing else, perhaps these will point the way to some
entries in a FAQ.
WARNING: The following is lengthy. Proceed at your own risk.
CONSIDERATIONS FOR DESIGN OF SUBVERSION
a) Adding a CVS proxy
There have been many source code management tools that have come and gone.
We know that Subversion will eventually catch on based on its WebDAV/DeltaV
interface, CVS feature set, and integration with Apache, but in the
changeover time there is a chicken-and-egg problem.
The problem is not on the server side. Anyone that knows how to run a CVS
server probably also knows how to drop a module into Apache, and with the
conversion scripts the process, although one way, should be simple. The
only reticence that server operators are going to show is in their loss of
all the people with CVS clients.
CVS clients have finally become somewhat ubiquitous. Most IDEs provide at
least some support for CVS built in, some of them quite sophisticated. We
are finally at the point that SCC has been at for years. Yes, the client
library will simplify porting new clients into new environments, but it
still must be admitted that this process will take some time.
My idea is to write a new server that does nothing but receive CVS pserver
protocol requests from a client and translates them into an SVN request,
and translates the response back to the client. In Design Pattern terms, a
Mediator. Of course, the two models are fundamentally different and only a
fool would think they are going to work perfectly together, but consider
this. The number one use of CVS on the public Internet is to checkout a
copy of the source code of some project anonymously. If we could support
just this one feature, we would be supporting 80% (93.7% of all stats are
made up) of the actions of users on the Internet. And I think we could go
much farther than that.
Perhaps the CVS proxy would have to store local data that was eventually
tossed out. Perhaps it would create SVN hierarchies as regular files for
the CVS client. Perhaps the Subversion server would be forced to put in
place certain types of properties in aid of the translation to CVS. In any
case, we should be able to accomplish some rough equivalence that is
suitable for some percentage of users.
So how to create this Proxy? I can think of several ways:
- Existing CVS source code could be linked into the new libraries (ugh!)
- Apache 2.0 is supposed to have better support for implementing new
protocols, so in theory we could write a MOD_PSERVER. In practice, I've
looked at the one non-HTTP based protocol they support (mod_echo) and it is
unclear to me how it would be done. Perhaps it was easy, but the
documentation wasn't there for me to know.
- We could use Avalon. It is written in Java, though, so there would be
an issue with how the Subversion libraries would be supported.
* Use JNI to get at the libraries
* rewrite the libraries in Java
* Translate the CVS pserver request into a WebDAV/DeltaV request, and
reissue it to the URL of your choice. This seems the most generic solution.
There is another possibility here if we were to support commits. Since
branches in Subversion are such lightweight options, clients coming in
through an anonymous CVS proxy could automatically be redirected by the
proxy to a branch, perhaps one based on their email address. That way,
changes could go into the repository and anyone, at any time, could review
them at their leisure without worrying about anything getting screwed up.
That may be too insecure for some folk, I don't know.
So what do people think about that idea? Is the "marketing" slant correct?
Are there other ways of accomplishing the same task that I haven't
considered? Are the ones I have considered easier or harder than I think
they are?
I'm thinking that developing this could be my initial contribution to
Subversion, so I'd appreciate any points of view out there.
The path to Subversion is a slippery slope (or it should be).
b) Drop in replacement for CVS
There are many sysadmins with fairly sophisticated scripts for integrating
the CVS server into their code management and release strategy. It would be
awfully nice if Subversion knew how to interpret all of the standard files
in the CVSROOT directory and just worked with them. Perhaps this could be a
Subversion add-on module in the Apache style.
c) Linking to Zope
Zope is a great application framework that already supports WebDAV. It is
written in Python which has an easy mechanism for calling into C code, and
it already supports many different types of data access. It's own object
database is versioned. This suggests to me that asking the Digital
Creations people to support DeltaV and providing an alternative backend for
Zope could result in some quick wins for everyone, once the Subversion file
system becomes a little more stable.
d) Wiki
At this stage in Subversion's life, it looks to me like it would be
beneficial to offer all of the documentation up on the web in some forum
that encourages collaborative editing, such as a Wiki. Periodically, a
script could suck the material off of the Wiki and check it in (or the Wiki
could use Subversion for its file store).
We should all make a concerted effort to keep any documentation on
Subversion as up-to-date as possible. The documentation that exists now is
really lacking, but with a communal editing effort could be kept in sync
with the code.
e) Mount points in the file system
I have read in a recent message that sub-repositories were argued to death
and summarily rejected, but I'm at a bit of a loss to understand why. It
doesn't SEEM to have much impact on the file structure (though I am
probably missing something), and the existence of the APR, NEON, and
EXPAT-LITE subrepositories within this repository certainly show its
desirability.
Imagine an entity that is not a file or a directory, but a mount. It points
(with a URL) to a particular point within another repository, and includes
the full revision information so that, no matter what is done to the remote
subrepository, the revision information in our local copy identifies the
same state of the remote repository tree.
Users checking out your repository are redirected, when they come to the
mount link, to get the subrepository information from the other repository,
using the original revision information. Files checked out through the
remote repository would be read-only and marked as non-commitable. That is,
the tree of files from the remote repository would be treated in the same
way that files checked out with a fixed, non-branch tag are in CVS. No-one
could check in anything into a the subrepository.
Updates to a mount point would happen only through a special administrative
command, to keep the library you are relying on stable until you say you
want to update to a later version.
Now, I can see that there might be authentication issues here, but I would
think that they could be worked around (to the point of saying it just
doesn't work unless there is coordination between the repository owners).
This is simply too important an issue to be flushed without making
extremely sure it would be more hassle than it is worth, in my opinion.
f) Staying flexible enough for alternate uses
I am a little leary about forcing SVN clients to implement a certain
behaviour. It can shut down uses on you if you aren't careful.
For example, consider a client that runs every night and backs up all
changes on my hard drive to a SVN server. That way, I can at any time
upgrade my hardware and restore my new machine to the same state as my old
machine in a relatively short time. In this scenario, I really don't want a
mirror of my hard drive littered across many SVN directories. My tradeoffs
here are much different from the standard source code control system. I am
probably not restoring that often, and don't mind a major speed hit when it
happens. I probably want to keep my own custom table of file sizes and
dates to speed the backup, though, along with perhaps a checksum.
Another example could be a Linux file system implemented on top of the SVN
file system. New tools could allow you to switch revisions of a file or
tree, or switch to another branch. Every time you closed a file, it would
be checked for whether it needed to be checked in. In fact, one could
imagine a Redhat distribution and a Debian distribution on the same system.
If properly done, it could understand that files such as /etc/smb.conf on a
RedHat system are the same as /etc/samba/smb.conf on a Debian system, and
even allow you to keep the two branches in sync for certain files. Such a
file system would be transactioned, versioned, and property based (allowing
maximum flexibility in ACL control). And slow. But for some people, it
might be the right tradeoff.
My last example would be an NTFS-based file system that kept a remote SVN
server in sync. Imagine periodically deleting all of your files from your
hard drive. Any attempt to open a file would first check for a local copy,
and failing that perform a check out from the SVN server before opening
it. This would allow you to clean out all the gunk in your hard drive while
remaining confident that the material was still readily available to you
(with a slight initial delay for each new file). Of course, you couldn't
run a Virus Checker on the client but otherwise it could allow you to delay
upgrading your hard drive for quite a long time.
Ok, just a few pie-in-the-sky ideas but hopefully you get the point. There
are times that storing too much information client-side is a really bad
trade-off for the application. While Subversion is initially intended for
source code control only, is it possible to keep our options open?
g) Modular form for Subversion server
One of the great strengths of Apache is in its modular form, with support
for adding new modules fairly simply. So what is the plan for modular
support in Subversion? Subversion is already a mod of Apache, but shouldn't
it support its own plug-in API specifically for adding on features to
source code management? I'm thinking of different LOD models, approval
systems, release states, other tools, etc.
Integrating scripting languages would of course be an important step.
Perhaps Subversion mods that themselves required Apache mod_perl or
mod_python to be installed, and that extended those mods to supply data
structures specific to Subversion.
I once wrote (http://groups.yahoo.com/group/info-cvs/message/15036) a
series of steps for a CVS server that mimiced the steps of the Apache web
server. I've excerpted the relevant part here:
> Here is an example of the steps the CVS server might go through, and that a
> module could register itself for. Of course, some commands would only go
> through a subset of the steps:
>
> a) Authorize user (pserver, kerberos, GSSAPI, etc)
> b) Identify command (standard commands, grouped commands, add your own)
> c) Translate namespace to source namespace, destination namespace (&
filter)
> d) Verify user can access source &/or destination namespaces
> e) For each file/directory
> 1) Retrieve from source
> 2) Determine file properties (mime type, binary, revision, tags, etc)
> 3) Translate file (keywords, promotion level, line-ends, diff, etc)
> 4) Apply command
> 5) Generate server-side files (CVSRoot, CVS directory)
> 6) Send results to Destination
> 7) Log any file-specific info
> f) Log command info
Of course, Apache does many of these things for us, and the design of
Subversion undoubtably makes others irrelevant, but perhaps consideration
could be given as to which steps are still relevant, and how modules could
hook into them.
h) Rewriting libs in other languages
Once the API has stabilized, it is important that the ability to access the
Subversion libraries be available in as many languages as possible. These
tasks are probably outside the effort on Subversion itself, but hopefully
volunteers from the other communities can be found to make that effort. For
a tool like this that will require clients in all kinds of environments,
language neutrality is a very important feature to strive for from the
beginning, I think.
i) Server controlled UI for conforming clients
It would be awfully nice if, right from the beginning, support was provided
for servers to have some control over the user interfaces of the clients.
This way, as each group customized their source code management system to
support their particular view of the world, no matter how controlled or how
lax, the clients could automatically adapt to provide the right user
interface. This might mean that people filling different roles would see
completely different user interfaces.
The current standard way to do this kind of server-controlled user
interface right now is through HTML Forms, but of course they are not
nearly rich enough to provide a complete user interface. I see three
possible contenders, each with their own flaws:
1) XUL - seems to be oriented almost exclusively at web browsers.
Still once, DeltaV becomes integrated into Mozilla XUL may allow Mozilla to
become a versioning client.
2) UIML - the most general purpose solution, but it seems to be so
abstract as to provide no actual user interface elements that are usable at
this time. Perhaps that will change over time.
3) XForms - this is the option that seems to show the most promise,
although the standard is still developing.
Anyway, those are my thoughts for now. I'd appreciate hearing from anyone
who has any reaction at all to what I've written, even if it a belief that
I should just go away with these high-faluting ideas for now while the
serious work is done. I'd particularly like to hear from someone who has
information or ideas about the CVS proxy idea. Thanks.
Received on Sat Oct 21 14:36:23 2006