On Wed, Aug 22, 2001 at 02:59:23PM -0500, Eric W. Sink wrote:
>
> Pre-message summary: I'm responding to a fragment of an email
> from Greg Stein from eight months ago. This topic is completely
> non-urgent, and I know you're all busy with M3. If responses are
> slow to come, I will certainly understand.
But we're interrupt-based. The mail arrives, and we take an interrupt to
read it.. thus consuming time away from M3.
;-)
>...
> Subversion's use of DB appears to be a very, very simple model. There
> is no apparent notion of columns within the tables, as one might expect
> in a SQL database.
Right. Since DB is not table-based, but simple key/value pairs, we had to
come up with a design that could store our complex structures (e.g. nodes or
property lists or whatever) into this key/value design. Being the Lisp
afficionado that Jim Blandy is ;-) ... he use the skel_t as a way to put
Lisp into SVN :-)
It is actually a reasonable approach to marshalling the structures. We could
write individual structure packing/unpacking functions to some kind of a
stream/buffer format, but the skel_t does a pretty reasonable job.
> Rather, each "record" is a key and a bunch of bytes.
> More specifically, the "bunch of bytes" is a skel.
>
> An alternative implementation to replace DB could simple provide an
> alternate way of storing data of identical complexity: Just keep
> track of keys and skels.
Yup. But if we're targeting a SQL database, the skel approach loses all of
SQL's query capabilities...
> Slightly more complex of course is the fact that there are five tables.
> But this is pretty simple as well.
Yah.
> So, ignoring transactions for a moment, I could implement a replacement
> for DB as follows:
>
> create five directories, one for each of the tables
>
> store every record as a file in the proper directory.
> the filename is the key, and the contents of the file are
> the skel
>
> everywhere I see a DB call I would replace it with a simple
> file IO call
>
> Like I said, I'm ignoring transactions for the moment, to see if I
> understand the data model. So now I'll ask: Am I missing something
> here?
Not at all. For example, we could use the apr_dbm functions in APR-UTIL
which would give us access to a number of other DBMs, such as gdbm.
> If the above is true, then the same thing can be done for a SQL
> backend:
>
> create five SQL tables
>
> each table will have two columns:
>
> a key
> a blob
>
> And this would work. However, it would not allow us to take
> advantage of the querying capabilities of a SQL db. We would be
> using a SQL db in exactly the same fashion as we use Berkeley db.
Right. The queries are the coolest thing... "what files were changed by
<this> author?"
> In fact, we could get pluggable DB-replacements by creating a very
> simple abstraction API. Something like the following would be
> approximately sufficient (hand-wave, hand-wave):
> create/delete a db
> begin/abort/commit txn
> put(table, key, skel)
> skel = get(table, key)
Anything skel-based isn't going to have the right kind of semantics. We need
to map things to specific columns, and the skels lose that semantic
information.
> Obviously, another way of doing this would be to replace all of the
> implementation in libsvn_fs. This might allow us to make better use
> of the storage facilities of the underlying DB. Instead of storing
> pairs of keys and skels, each key would correspond to a collection
> of columns. The individual atoms in the skels would be placed in
> individual columns. This would allow us to query against those
> columns, but that's about the only difference. ( I admit that it's
> a big difference. )
Skels aren't the right way to do this... instead, we would have db-specific
mappings from internal structures to the appropriate mechanism for the
database at hand.
Some databases may choose to marshal the structures as skels, but anything
with facilities richer than a plain key/value will (most likely) map the
structures into input/output binding buffers for the database.
Note that our memory consumption will be *much* better if we can avoid skels
and map directly to/from database binding buffers.
> It looks to me like some implementations of libsvn_fs will want
> to use skels. DB is obviously an example. I suspect that an
> implementation on plain text files would be another example. Why
> reinvent this particular wheel?
Yes, we could keep the skels around as a "db utility" for use by various FS
backends. No problem there.
> However, a SQL-based store is an example where it would *probably*
> make more sense to design a mapping which does not use skels.
A lot more than probably. It is *definitely* more sense to avoid skels.
>...
> Am I capturing the issues which prompted you to make the remark
> about having skels be DB-specific?
Yup :-) They just don't belong in other DBs.
Cheers,
-g
--
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:36 2006