[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Robots and Spiders, oh my!

From: Brian W. Fitzpatrick <fitz_at_red-bean.com>
Date: 2004-03-15 17:06:35 CET

On Mon, 2004-03-15 at 09:06, Mark Benedetto King wrote:
> On Fri, Mar 12, 2004 at 07:47:46AM -0600, Brian W. Fitzpatrick wrote:
> >
> > If some dumb crawler comes along and decides to crawl
> > httpd-2.0/[trunk|tags|branches], it's going to suck somewhere in the
> > neighborhood of 2.5GB of bandwidth as it grabs every tagged and branch
> > revision on it's way to trunk.
> >
> > See the problem?
> >
>
> Yes, but wouldn't you *want* to, for example, be able to search for
> a particular error message that is no longer on trunk?

I dunno. I guess I don't see that as something useful to me, but I
could see how others might like it.

> Especially since svn doesn't have "svn grep -r X:Y"[*], this seems like
> a small price to pay (a little bandwidth every now and again) for
> such a feature.

Considering the number of (dumb) crawlers out there, I suspect that it
would be a non-trivial amount of bandwidth.

> Also, while that would be true for a dumb crawler, many crawlers these days
> are anything but. They should be smart enough to notice the ETags/Modified
> Times/etc and not need to refetch/reindex everything. If they aren't smart
> enough, they'll be penalized (they'll have to eat the bandwidth hit on their
> end too) and the market economy will kick in and force them to become
> smarter.

That's a good point.

-Fitz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Mar 15 17:07:18 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.