On Fri, Dec 25, 2020 at 11:17 AM Daniel Shahaf <d.s_at_daniel.shahaf.name>
wrote:
>...
> > I'll figure out a way to have the mboxes downloadable. If I understand
> > Google's documentation of robots.txt they don't care about robots.txt if
> a
> > specific URL is linked from somewhere indexable, they will index it
> anyway.
> > Maybe just make one big tarball of everything?
>
> One big tarball would be wasteful to consume (would have to download
> everything) and to produce (would need to, basically, «cp everything.tgz
> tmp.tgz; tar -zcf - $new >> tmp.tgz; mv tmp.tgz everything.tgz», and you
> can
> see that's O(#everything) rather than O(appended stuff)). Would rather
> avoid
> it if possible.
>
> Not sure what to do about robots. I suppose we could set <link
> rel="canonical"> in the HTTP headers when serving the rfc822 files (example
> in <https://en.wikipedia.org/wiki/Canonical_link_element#HTTP>)?
>
I thought robots.txt can exclude subdirs. So just cut off (say)
svn-haxx.apache.org/mbox/
I'm not too worried about Google crawling the mboxes, as they'll likely do
it just once and never again (by keeping the etag and/or mtime).
>...
> > I couldn't figure out puppet, the links was 404 for me. I've created a
> > request in Jira and I hope someone will take a look:
> > https://issues.apache.org/jira/browse/INFRA-21230
>
> I think the github repository is restricted to Apache committers only, so
> you'll need to enter your github username on id.apache.org in order to get
> access to that URL. If you don't have a github account, there ought to be
> a mirror of the repository on *.apache.org somewhere (at least, if Infra's
> following the same policy PMCs do).
>
Correct: committers only. And only after linking accounts via
https://gitbox.apache.org/setup/ as Nathan noted (and we forgot to mention
to DSahlberg).
If you do not have a GitHub account, or do not want one (say, because you
don't want to accept their T&Cs), then you can use the repository via
gitbox.apache.org (ask on Slack for the link; I prefer not to post it here).
Cheers,
-g
Received on 2020-12-25 23:54:26 CET