[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: stopping webcrawlers using robots.txt

From: Ryan Schmidt <subversion-2006c_at_ryandesign.com>
Date: 2006-07-10 14:32:19 CEST

On Jul 10, 2006, at 00:07, Evert|Rooftop wrote:

> I'm guessing you could still create an alias for robots.txt.. but
> im not
> a 100% sure..
>
> We simply use authentication everywhere.. I can't really understand
> why
> you would want to open you repository for everyone, except robots..

The motivation might not be to exclude or include any particular
robots. (Well-written) robots will request /robots.txt on hosts they
crawl, just like (many modern) browsers will request /favicon.ico. If
these files are not present, Apache will log a 404 to the error log.
For a sysadmin trying to use the error log to see if there are any
real problems on the site, these "false positives" quickly become
very irritating, and the sysadmin will look for a way to shut them off.

If you were using a single repository with SVNPath, you could use
Bob's suggestion:

On Jul 9, 2006, at 20:42, Bob Proulx wrote:

> I suppose you
> could check in robots.txt into the top level of your repository. But
> then it would be part of your project and so forth.

But since you're using SVNParentPath and multiple repositories, that
option is not available.

I tried this suggestion from Todd which sounded promising:

On Jul 10, 2006, at 01:44, Todd D. Esposito wrote:

> Alias /robots.txt /some/non/svn/path/robots.txt
> <Location /robots.txt>
> SetHandler default-handler
> </Location>

But it doesn't seem to be working.

This may not be a great help to you, but when I was unable to solve
this within Apache, and since I was playing around with the lighttpd
web server anyway, I arranged it so that web access to the repository
occurred via lighttpd, which proxied all requests to Apache running
on a different port -- all requests, that is, except for favicon.ico,
robots.txt, and the CSS and XSLT stylesheets. Working copies
themselves directly accessed the Apache port (since although lighttpd
is supposed to support proxying to Apache / Subversion, it seems to
be broken at the moment).

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Jul 10 14:34:51 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.