Arved Sandstrom <Arved_37@chebucto.ns.ca>:
> Couldn't help throwing in my 1.2 cents Canadian on this one,
> though. To start with, exactly what is the magical (transparent
> non-binary) format that will make it _easy_ to detect corruption and
> recover from same? I can throw out a few guesses, but I won't.
I don't know what the format should be. If I were designing it, it
would probably be some sort of XML with auxiliary binary
nane-to-offset indices that get regenerated on the fly whenever the
text data part is newer than the index. That way you get both the
speed advantages of a binary format and the transparancy advantages of
> Number two, a team that's prone to writing code that garbages up a DB is
> going to be prone to writing code that garbages up a text (non-binary)
True, but not the point. The point is that it's a lot easier for human
eyeballs to grok patterns in text than in binary. So it's easier to spot,
diagnose, and correct corruption bugs.
> In the final analysis, though, why mention a putative "major"
> problem without explicitly mentioning a solution? I'm curious.
I didn't mention a solution because I don't have one. That doesn't mean
I can't see a big whacking problem when it stares me in the face -- in fact,
I'm embarrassed that it took Larry McVoy to bring my attention to it.
As Donald Knuth said "Premature optimization is the root of all evil."
Binary formats are almost always premature optimization -- they sacrifice
debuggability (and, hence, development time) for performance gains that
are usually marginal. They should be used sparingly, and usually only
as automatically-regenerated caches or fast indexes for text masters.
Eric S. Raymond
Non-cooperation with evil is as much a duty as cooperation with good.
-- Mohandas Gandhi
Received on Sat Oct 21 14:36:27 2006