[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Current subversion books not being indexed by google? (was Re: --dry-run)

From: Andy Levy <andy.levy_at_gmail.com>
Date: Thu, 24 Jul 2008 07:17:37 -0400

On Thu, Jul 24, 2008 at 06:33, Jan Hendrik <list.jan.hendrik_at_gmail.com> wrote:
> Concerning Current subversion books not being
> Benjamin Smith-Mannschott wrote on 24 Jul 2008, 11:55, at least in part:
>
>> Google seems to like the old verisons of the book overly much. When I
>> do a search for "subversion cheap copies" (as I did a while back in
>> answering a question on the list) I get a deep link into the 1.1 book.
>>
>> I don't suppose there's much that can be done about this though, is
>> there?
>
> Well, there's nothing that is guaranteed to work or cause Google or
> other searchengines to index or not to index or to remove
> indexed pages, but there are a few things which could be done:
>
> A) at the root of svnbook.red-bean.com create a file "robots.txt"
> with the following content:
>
> User-agent: *
> Disallow: /en/1.0/ <= or whatever directory shall not be snooped
> into. Multiple "Disallow:"s allowed
>
> Instead of the * the Google bot name can be entered (Google for
> them names or check the server log files!)
>
> B) in the headers of all pages which *should not* be indexed add
> the line
>
> <meta name="robots" content="noindex,follow">
>
> or even
>
> <meta name="robots" content="noindex,nofollow">
>
> C) add a Google sitemap and notify Google by using their
> sitemap_gen.py script (from Google Webmaster Tools)
>
> The script can be patched to also notify some other searchengines
> accepting sitemaps:
>
> NOTIFICATION_SITES = [
> ('http', 'www.google.com', 'webmasters/sitemaps/ping', {}, '', 'sitemap'),
> ('http', 'search.yahooapis.com', 'SiteExplorerService/V1/ping', {}, '', 'sitemap'),
> ('http', 'api.moreover.com', 'ping', {}, '', 'sitemap'),
> ('http', 'submissions.ask.com', 'ping', {}, '', 'sitemap')
> ]
>
> and a line in robots.txt can also be added so even more bots
> become aware of the sitemap:
>
> Sitemap: http://svnbook.red-bean.com/sitemap.xml.gz
>
>
> Be aware that these are no guarantee of whatsoever. Also, at least
> A & B would exclude the old book versions from being indexed at
> all, what may or may not be in the interest of the owners. For C
> the configuration for sitemap_gen.py would allow more options, but
> again these are just hints for the searchengine, no definitive order.

D) Modify the manual pages to indicate what version of the manual
you're looking at. With the really old ones (1.0 through 1.3), perhaps
a big red banner on the top reading "this content is probably out of
date for the version of Subversion you're running".

Really, any hint of which version of a random page you're looking at
other than just the URL.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-07-24 13:18:14 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.