Re: Robots and Spiders, oh my!

From: Brian W. Fitzpatrick <fitz_at_red-bean.com>
Date: 2004-03-12 14:47:46 CET

On Fri, 2004-03-12 at 00:33, Justin Erenkrantz wrote:
> --On Thursday, March 11, 2004 10:01 PM -0600 "Brian W. Fitzpatrick"
> <fitz@red-bean.com> wrote:
>
> > Shouldn't we have a big bold warning box in the book telling people to
> > create a robots.txt file in their DOCUMENT_ROOT That contains:
> >
> > User-agent: *
> > Disallow: /
> >
> > We've had this on svn.collab.net for ages, and I'm thinking we should
> > really let people know about it.
>
> For apache.org, I don't think we would do this. I don't see how excluding
> robots is possibly a good idea.

Here's an example:

httpd-2.0's repository contains 3,603 files, weighing in at about 41MB.
It also has 62 tags and branches.

Let's say that we convert this repository to Subversion, preserving
history.

If some dumb crawler comes along and decides to crawl
httpd-2.0/[trunk|tags|branches], it's going to suck somewhere in the
neighborhood of 2.5GB of bandwidth as it grabs every tagged and branch
revision on it's way to trunk.

See the problem?

> > # Disallow browsing of Subversion working copy administrative
> > # directories.
> > <DirectoryMatch "^/.*/\.svn/">
> > Order deny,allow
> > Deny from all
> > </DirectoryMatch>
> >
> > Thoughts?
>
> Neither for this. What's possibly sensitive here? The auth info isn't stored
> there any more. -- justin

Not for auth info... I'm thinking of the cases where people might leave
Indexes on but prevent indexing of directories by placing index.html
files in those directories... but if they forget about .svn and have
Indexes on, people will be able to browse their text-base. Or even if
Indexes are off, someone could just grab the .svn/entries file and have
a nice listing of what's in the directory.

-Fitz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Mar 12 14:53:43 2004

This message: [ Message body ]
Next message: Edmund Horner: "Re: Robots and Spiders, oh my!"
Previous message: Marcin Kasperski: "Re: Alternative Repository Layout"
In reply to: Justin Erenkrantz: "Re: Robots and Spiders, oh my!"
Next in thread: Edmund Horner: "Re: Robots and Spiders, oh my!"
Reply: Edmund Horner: "Re: Robots and Spiders, oh my!"
Reply: Mark Benedetto King: "Re: Robots and Spiders, oh my!"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]