Re: svn.haxx.se is going away

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Mon, 21 Dec 2020 10:03:45 +0000

Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100:
> Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <d.s_at_daniel.shahaf.name>:
>
> > Sounds good. Nathan, Daniel Sahlberg — could you work with Infra on
> > getting the data over to ASF hardware?
> >
>
> I have been given access to svn-qavm and uploaded a tarball of the website
> (including mboxes). I'm a bit reluctant to unpack it since it takes almost
> 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
> should we ask Infra for more disk space?
>

I vote to ask for more disk space, especially considering that some
percentage is reserved for uid=0's use.

> > Note that svn-org@ doesn't have an equivalent @s.a.o list, and that, as
> > mentioned upthread, the post-migration (from tigris.org to apache.org)
> > mboxes may be in a different order than the official ones, and shouldn't
> > be "deduplicated".
> >
>
> The mboxes will be preserved but I don't plan to make them available for
> download (since they are not available from lists.a.o or mail-archives.a.o).
>

Please do make them available for download. Being able to download the
raw data is useful for both backup and perusal purposes, and I doubt
the bandwidth requirements would be a problem. (Might want
a robots.txt entry, though?)

Regarding the behaviour of the existing archives, see
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202012.mbox>
(which used to also be available via
https://subversion.apache.org/mail/, but nowadays that just redirects
to a landing page ☹). I don't know whether lists.a.o has equivalent
functionality, but then again, lists.a.o has had vendor lock-in baked
into it from day one, so a lack of a "download raw rfc822 data" feature
might simply be another form of that.

The mod_mbox product is owned by dev_at_httpd.

> > You indicate a desire to maintain URLs. Do you have some ideas on that?
> >
> > Each individual message .shtml file contains the message-id in
> > a comment. We can extract the comments and build a redirector around
> > them. (By the way, this is basically the same exercise that Infra must
> > have solved back when Sebb received that CSV file from the lists.a.o
> > vendor, so there may be an opportunity for code reuse.) Of course, the
> > full rsync likely has the same info available less scrapily.
> >
> > Or, as mentioned above, the .shtml files could just be preserved
> > statically (plus or minus an appropriate message in the list of years on
> > the /${listname}/ page). In fact, I'm having trouble coming up with
> > a reason _not_ to serve a static snapshot of the pages, even if we do
> > build a redirector.
> >
>
> No redirector as of now, only the static [s]html pages.
>

> I will need some help from root to:

Not me, I'm afraid; ENOTIME.

> 1. Install a web server. nginx? (just kidding)

Apache HTTP Server would probably be a better choice since more dev_at_svn
and Infra people are familiar with it, but it's a fair question to ask.
(Cf. INFRA-7524)

> 2. Setup httpd.conf
> 3. Configure a DocumentRoot where I can put the files. Doesn't seem right
> to store them in /home

Hmm. These things should all be done via puppet. I'm not sure what's
best practice nowadays regarding writing puppet PRs and testing them,
though.

Cheers,

Daniel
Received on 2020-12-21 11:04:00 CET

This message: [ Message body ]
Next message: Nathan Hartman: "Re: svn commit: r1884636 - /subversion/trunk/subversion/tests/cmdline/merge_reintegrate_tests.py"
Previous message: Daniel Sahlberg: "Re: svn.haxx.se is going away"
In reply to: Daniel Sahlberg: "Re: svn.haxx.se is going away"
Next in thread: Nathan Hartman: "Re: svn.haxx.se is going away"
Reply: Nathan Hartman: "Re: svn.haxx.se is going away"
Reply: Greg Stein: "Re: svn.haxx.se is going away"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]