[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

svn:// protocol efficiency and Cascade

From: Matt Craighead <matt.craighead_at_conifersystems.com>
Date: Wed, 12 Nov 2008 13:58:22 -0600

I was just looking into what it would take to add support for the svn://
protocol to Cascade. Currently Cascade File System and Cascade Proxy only
support http:// and https:// repository access.

Background: Cascade does not make use of libsvn_ra_*. I wrote my own
DAV client because I was unable to get acceptable performance out of
libsvn_ra_dav -- too many round trips to the repository server. (A
secondary problem was how to ship a binary distribution of an application
using libsvn on Linux -- it did not appear to be possible to ship a single
binary that would run on all Linux distros, even assuming that libsvn* were
installed. For example, I had to know at link time whether the target
system was using apr0 or apr1.)

There were several reasons why libsvn_ra_dav generated too many network
round trips, but the biggest one by far was that it didn't allow me to get
the MD5 and other properties of each directory entry as part of
svn_ra_get_dir(2). The MD5s are actually sent across the wire as part of
the PROPFIND request, but the client library would throw them away rather
than allowing me to see them. The same goes for properties such as
svn:executable. The only way to get these properties through the API was to
do a get_file request for each file in the directory. Worse still, to get
the MD5 (even if I didn't want the file contents -- just to check if I
already had the file cached), I had to obtain the entire file over the wire
and MD5 it myself.

Looking at the svn:// protocol, it appears that I may have the same problems
all over again. Ignoring any limitations of libsvn_ra_svn and looking
straight at the protocol -- I can send a get-dir request, but this provides
very limited information about each directory entry. It appears that in
order to implement my caching subsystem's internal equivalent of
svn_ra_get_dir(), I may have to send quite a few queries for each directory
entry. For example, the only way to get the MD5 appears to be to send a
get-file request.

I suppose I can attempt to pipeline my requests, so that I'm not paying the
cost of one or more network round trips per directory entry, but has there
been any consideration given to adding more information to the get-dir query
in the svn:// protocol?

Ideally, Cascade would like to be able to obtain all of the following
information for a directory (at a particular revision number) in a single
query that is always just a single network round trip:
- list of directory entries
- range of revisions (start and end revision) where these directory entries
are valid
- for each directory entry:
  - mode flags: is it a directory? is it a symlink? is it executable? is
it text (based on eol-style)?
   - timestamp of last commit for last modify time
  - if it's not a directory: size of file in bytes
  - if it's not a directory: MD5 or SHA1 of file contents
  - range of revisions (start and end revision) where the above information
for this directory entry is valid

Note that even in the DAV protocol I am not able to do all of the above as
well as I'd like. For example, I can get "start revisions" where the data
obtained is valid via the DAV:version-name property, but I cannot get "end
revisions", the revision of the next commit to that file minus one. Also,
for the revision range where the list of directory entries is valid, I have
to do a separate REPORT query to get the last commit to that tree... not
only an extra query, but this results in a very conservative revision range
because the probability that that commit modified the list of directory
entries (by adding or deleting a file or directory *in that directory*, not
just in a subdirectory) is pretty low.

Any thoughts on enhancing the svn:// protocol to return more information in
get-dir? Any thoughts on how to efficiently implement the query I just
described using the existing svn:// protocol? Also, any thoughts on
getting better revision ranges for the DAV protocol? The larger the
revision ranges I can get, the more effective my caching will be -- the
fewer unnecessary queries to the SVN server.

-- 
Matt Craighead
Founder/CEO, Conifer Systems LLC
http://www.conifersystems.com
512-772-1834
Received on 2008-11-12 21:46:29 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.