[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Lame mailing list software (was: svn commit: r36404 ...)

From: Jack Repenning <jrepenning_at_collab.net>
Date: Thu, 12 Mar 2009 18:09:24 -0700

On Mar 7, 2009, at 7:55 PM, Greg Stein wrote:

> Jack,
>
> Months ago, when the mailing list software was "upgraded" into this
> broken state, you said there were people working on it with the
> highest priority and non-stop. What is the status of that work?

(Sorry for the delayed reply -- mail filter squirreled this somewhere
unexpected!)

There's lots of work still going on. For example, on March 10, we
committed what we believe will be the complete fix for the "large
message guillotine" problem. Here's the change log:

> With this patch, in case an email body is larger than 50kb, it gets
> cut and the first 50kb are stored in the DB, while the entire body
> is stored at the file system (This, and the rest of the operations
> are made with streams, so the entire body is never loaded into
> memory). At the moment of sending the emails to the subscribers, the
> message is reconstructed correctly using the body stored at the file
> system. On the other hand, the UI will show only the first 50kb
> stored at the db, and in case the full body is on the FS, a link
> will be presented informing this situation. When the user clicks it,
> it will download the body to the user's computer and get
> automatically opened by the browser.

Maybe that's not so clear, being a mix of internal and external
descriptions. Allow me to translate a bit.

This whole guillotine effect arose because large messages were being
read into memory and parsed into Java objects reflecting the entire
MIME structure of the message. There's no reason to read whole
messages into memory, let alone parse their little brains out; we
should stream them. Furthermore, this ultimately caused the entire
message to be stored in the database tables, which is an inefficient
and fairly pointless way to store large bodies of opaque data (which
is what mail message bodies are, at least to the mail transport).
Under stress testing, this "read 'em and weep" approach had the
predictable effects of sucking up all the memory in the site,
exploding the database, bogging everything down, and OOMEing the Java
engine. All of which is, of course, inarguably bad, and needed to be
fixed.

The fix chosen, however, wasn't so hot: large messages were turned
into attachments, because the message handling system in use, James,
was already reasonably clever about handling attachments in a streamy
way. This is what the paragraph above means by "stored on disk": by
making things into attachments, James was clued to save and stream the
data. As a cheesy hack to get some performance out of an external
library, that might be OK, but it needed to be implemented in a way
that didn't change the structure of the messages when we send 'em back
out.

It wasn't.

Now, it is.

I'm working on getting a schedule for when this fix can be deployed to
Tigris.

-==-
Jack Repenning
Chief Technology Officer
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
mobile: +1 408.835.8090
raindance: +1 877.326.2337, x844.7461
aim: jackrepenning
skype: jackrepenning
twitter: http://twitter.com/jrep

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1315076
Received on 2009-03-13 02:09:40 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.