Hello everybody!
[DISCLAIMER: I read the mailing lists, and I looked into the issues. But of
course something could have escaped me]
As some event I couldn't yet isolate trashed some of my files I once again
rethought my backup-strategy. I came to the conclusion that instead of
copying my /home to another disk from time to time I can do better: I should
versionate my /home, so that I can do a commit on every shutdown and "nothing
ever will be lost" (TM).
Then I thought about my favourite version control system (svn :-) and found
that there would be some improvements to be done.
(Please read further, the interesting part comes later)
- First of all, the .svn-directories are a bit distracting. Furthermore, in
one of my projects my wc has about 100 000 inodes, of which are 78800 in .svn
directories (ok, I've got properties on nearly every file). So that's a bit
inefficient. Of course, there are already some issues.
rethink .svn/ area read-only files and dirs strategy
http://subversion.tigris.org/issues/show_bug.cgi?id=1294
Need support for opaque collections/"document bundles" in wo
http://subversion.tigris.org/issues/show_bug.cgi?id=707
- Next, the text-base files are unnecessary in some cases. If svn is used as a
backup media, a diff is seldom required. So the space could be saved.
Store text-base compressed
http://subversion.tigris.org/issues/show_bug.cgi?id=908
- Furthermore, the performance is sometimes not as it could be.
performance bad in "svn mv" with whole directories
ttp://subversion.tigris.org/issues/show_bug.cgi?id=1284
So what's the problem, what's the answer?
Of course I'll choose the easy target - libsvn_wc.
Part of the problem is that there's a directory tree to remembered, to know
what the base of the wc is. This is currently done in the .svn-directories,
in files like entries, dir-props, and some directories.
But wait - we already do that! Yes, in some genially piece of software already
in svn! libsvn_fs is it called.
So - answer: Why don't we use the already perfectly working part for the
repository on the wc side too?
It stores the tree-structure, has multiple versions (base of wc, current wc
state), can store the properties inside, and even the textbase!
So I propose to *allow* for something like this (discussion below)
- On checkout there's a parameter "--with-metadata-to DIR" which does a
similar structure to the repository in the given directory, to save the
wc-meta-information, and makes an entry in a ~/.subversion-file which says
"everything below this checked-out directory belongs to the meta-data at DIR"
That gives us 707, 1294, and (possibly) 1284, as there is no .svn/entries-file
to be opened several hundred times for a bigger "svn mv".
- Another parameter for checkout is "--save-text-base-for N", where N can be a
number of days or similar.
On diff all files diffed are cached in db/strings in DIR; on commit the
changed files are stored there, too (so the deltification data can be done
locally), and on every use of a cached file a timestamp for this file is set
to the current date.
In some crontab or similar the user can do an entry like "svnadmin purgewc
DIR", which deletes "old", unused entries out of db/strings.
That gives us 908.
- svn invoked reads ~/.subversion/working-copies and looks for the longest
matching path, and looks into the current directory for a subdirectory .svn.
If only one is present, no problem.
If both are, then **MAKE A POLICY DECISION HERE**. (hehe :-)
- locks are done by setting a lock bit for the current wc dir in the stored
directory tree, and for each leaf directory "downwards", like the current
locks.
<speculative>I'm not sure about bdb, but updates of single bits are possible
faster than creation of files. Possibly there should be a single bitmap for
these lock-entries, mmaped or so for maximum performance. </>
- (A bit off-topic:) To achieve a (nearly) un-supervised execution of backups
(as in 707) the current perl-script svn_load_dirs.pl could be extended with
manber hashing, as already discussed here (see
http://marc.theaimsgroup.com/?l=subversion-dev&m=106810408730390&w=2)
And the various checksums could be stored in the "local" repository.
The more I though about using a berkeley (or whatever) database for the wc
(*with the same or a very similar codebase as libsvn_fs*) the more
similarities I found, which would make that easy. (ar least to the naive mind
of a non-developer)
Pro & Con:
+ uses already done code from libsvn_fs, with minor modifications.
+ possibly allows for restructuring of libsvn_wc, which is a fragile piece of
work (at least that's my impression reading the mailing list)
+ That solves some issues, at least the four above.
+ It may be a performance improvement.
+ It should save some space. On a 4k filesystem the
~ the current way could be left, as it's much easier for moving wc's around.
(although I don't really buy the point - who can move wc's can update a
pointer in a ~/subversion/-file.
- changes to current architecture are needed.
Let the discussion arise :-)
Regards,
Phil
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Feb 10 08:44:37 2004