Daniel Sahlberg wrote on Thu, Dec 24, 2020 at 20:38:17 +0100:
> Den tis 22 dec. 2020 kl 02:08 skrev Greg Stein <gstein_at_gmail.com>:
> > On Mon, Dec 21, 2020 at 4:03 AM Daniel Shahaf <d.s_at_daniel.shahaf.name>
> > wrote:
> >> Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100:
> >> > Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <
> >> d.s_at_daniel.shahaf.name>:
> >> >
> >> > > Sounds good. Nathan, Daniel Sahlberg — could you work with Infra on
> >> > > getting the data over to ASF hardware?
> >> >
> >> > I have been given access to svn-qavm and uploaded a tarball of the
> >> website
> >> > (including mboxes). I'm a bit reluctant to unpack it since it takes
> >> almost
> >> > 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
> >> > should we ask Infra for more disk space?
> >> I vote to ask for more disk space, especially considering that some
> >> percentage is reserved for uid=0's use.
> > DSahlberg hit up Infra on #asfinfra on the-asf.slack.com, and asked for
> > more space. That's been provisioned now.
> I've unpacked in /home/dsahlberg/svnhaxx
> > >...
> >> > The mboxes will be preserved but I don't plan to make them available for
> >> > download (since they are not available from lists.a.o or
> >> mail-archives.a.o).
> >> Please do make them available for download. Being able to download the
> >> raw data is useful for both backup and perusal purposes, and I doubt
> >> the bandwidth requirements would be a problem. (Might want
> >> a robots.txt entry, though?)
> > Bandwidth should not be a problem for the mboxes, but yes: a robots.txt
> > would be nice. I think search engines spidering the static email pages
> > might be useful to the community, but the spiders really shouldn't need/use
> > the mboxes.
> I'll figure out a way to have the mboxes downloadable. If I understand
> Google's documentation of robots.txt they don't care about robots.txt if a
> specific URL is linked from somewhere indexable, they will index it anyway.
> Maybe just make one big tarball of everything?
One big tarball would be wasteful to consume (would have to download
everything) and to produce (would need to, basically, «cp everything.tgz
tmp.tgz; tar -zcf - $new >> tmp.tgz; mv tmp.tgz everything.tgz», and you can
see that's O(#everything) rather than O(appended stuff)). Would rather avoid
it if possible.
Not sure what to do about robots. I suppose we could set <link
rel="canonical"> in the HTTP headers when serving the rfc822 files (example
> > I think the first thing is to get httpd up and running with the desired
> > configuration. Then step two will be to memorialize that into puppet. Infra
> > can assist with the latter. I saw on Slack that Humbedooh gave you a link
> > to explore.
> Since I havn't got root, I can't get any further to install httpd on my own.
Post a list of packages you'd like installed?
> I couldn't figure out puppet, the links was 404 for me. I've created a
> request in Jira and I hope someone will take a look:
I think the github repository is restricted to Apache committers only, so
you'll need to enter your github username on id.apache.org in order to get
access to that URL. If you don't have a github account, there ought to be
a mirror of the repository on *.apache.org somewhere (at least, if Infra's
following the same policy PMCs do).
Received on 2020-12-25 18:17:45 CET