[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Robots and Spiders, oh my!

From: Edmund Horner <chrysophylax_at_chrysophylax.cjb.net>
Date: 2004-03-12 06:07:34 CET

Brian W. Fitzpatrick wrote:
> What would happen if a robot crawled a big repository with a whole
> lotta tags and branches?
>
> Shouldn't we have a big bold warning box in the book telling people to
> create a robots.txt file in their DOCUMENT_ROOT That contains:
>
> User-agent: *
> Disallow: /
>
> We've had this on svn.collab.net for ages, and I'm thinking we should
> really let people know about it.

Depending on the number of directories, it might be desirable to allow
crawling of directories, but not of file contents. For example I have
often searched Google for "somefile.c webcvs" in order to have a look at
that source file. But can that be done with /robots.txt ? I believe
not :-(

> And while I'm at it, how about advising people who use Subversion
> working copies as websites to put something like this in their
> httpd.conf files:
>
> # Disallow browsing of Subversion working copy administrative
> # directories.
> <DirectoryMatch "^/.*/\.svn/">
> Order deny,allow
> Deny from all
> </DirectoryMatch>
>
> Thoughts?

Sounds like a good idea. IIRC .svn can contain things that should often
by kept private in cases like this?

Of course, not everyone who uses a WC for the web documents will be
running Apache, but (hopefully) the idea translates easily to other servers.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Mar 12 06:08:14 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.