[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Current subversion books not being indexed by google? (was Re: --dry-run)

From: Jan Hendrik <list.jan.hendrik_at_gmail.com>
Date: Thu, 24 Jul 2008 12:33:49 +0200

Concerning Current subversion books not being
Benjamin Smith-Mannschott wrote on 24 Jul 2008, 11:55, at least in part:

> Google seems to like the old verisons of the book overly much. When I
> do a search for "subversion cheap copies" (as I did a while back in
> answering a question on the list) I get a deep link into the 1.1 book.
> I don't suppose there's much that can be done about this though, is
> there?

Well, there's nothing that is guaranteed to work or cause Google or
other searchengines to index or not to index or to remove
indexed pages, but there are a few things which could be done:

A) at the root of svnbook.red-bean.com create a file "robots.txt"
with the following content:

User-agent: *
Disallow: /en/1.0/ <= or whatever directory shall not be snooped
                                                into. Multiple "Disallow:"s allowed

Instead of the * the Google bot name can be entered (Google for
them names or check the server log files!)

B) in the headers of all pages which *should not* be indexed add
the line

<meta name="robots" content="noindex,follow">

or even

<meta name="robots" content="noindex,nofollow">

C) add a Google sitemap and notify Google by using their
sitemap_gen.py script (from Google Webmaster Tools)

The script can be patched to also notify some other searchengines
accepting sitemaps:

  ('http', 'www.google.com', 'webmasters/sitemaps/ping', {}, '', 'sitemap'),
  ('http', 'search.yahooapis.com', 'SiteExplorerService/V1/ping', {}, '', 'sitemap'),
  ('http', 'api.moreover.com', 'ping', {}, '', 'sitemap'),
  ('http', 'submissions.ask.com', 'ping', {}, '', 'sitemap')

and a line in robots.txt can also be added so even more bots
become aware of the sitemap:

Sitemap: http://svnbook.red-bean.com/sitemap.xml.gz

Be aware that these are no guarantee of whatsoever. Also, at least
A & B would exclude the old book versions from being indexed at
all, what may or may not be in the interest of the owners. For C
the configuration for sitemap_gen.py would allow more options, but
again these are just hints for the searchengine, no definitive order.

Jan Hendrik
Freedom quote:

     The greatest Glory of a free-born People,
     is to transmit that Freedom to their Children.
               -- William Harvard

To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-07-24 12:33:01 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.