[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Robots and Spiders, oh my!

From: Mark Benedetto King <mbk_at_lowlatency.com>
Date: 2004-03-15 16:06:59 CET

On Fri, Mar 12, 2004 at 07:47:46AM -0600, Brian W. Fitzpatrick wrote:
>
> If some dumb crawler comes along and decides to crawl
> httpd-2.0/[trunk|tags|branches], it's going to suck somewhere in the
> neighborhood of 2.5GB of bandwidth as it grabs every tagged and branch
> revision on it's way to trunk.
>
> See the problem?
>

Yes, but wouldn't you *want* to, for example, be able to search for
a particular error message that is no longer on trunk?

Especially since svn doesn't have "svn grep -r X:Y"[*], this seems like
a small price to pay (a little bandwidth every now and again) for
such a feature.

Also, while that would be true for a dumb crawler, many crawlers these days
are anything but. They should be smart enough to notice the ETags/Modified
Times/etc and not need to refetch/reindex everything. If they aren't smart
enough, they'll be penalized (they'll have to eat the bandwidth hit on their
end too) and the market economy will kick in and force them to become
smarter.

[*] Wouldn't it be neat, though?

--ben

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Mar 15 16:45:13 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.