Re: Are skels a db-specific thing?

From: Greg Stein <gstein_at_lyra.org>
Date: 2001-08-23 07:02:28 CEST

On Wed, Aug 22, 2001 at 02:59:23PM -0500, Eric W. Sink wrote:
>
> Pre-message summary: I'm responding to a fragment of an email
> from Greg Stein from eight months ago. This topic is completely
> non-urgent, and I know you're all busy with M3. If responses are
> slow to come, I will certainly understand.

But we're interrupt-based. The mail arrives, and we take an interrupt to
read it.. thus consuming time away from M3.

;-)

>...
> Subversion's use of DB appears to be a very, very simple model. There
> is no apparent notion of columns within the tables, as one might expect
> in a SQL database.

Right. Since DB is not table-based, but simple key/value pairs, we had to
come up with a design that could store our complex structures (e.g. nodes or
property lists or whatever) into this key/value design. Being the Lisp
afficionado that Jim Blandy is ;-) ... he use the skel_t as a way to put
Lisp into SVN :-)

It is actually a reasonable approach to marshalling the structures. We could
write individual structure packing/unpacking functions to some kind of a
stream/buffer format, but the skel_t does a pretty reasonable job.

> Rather, each "record" is a key and a bunch of bytes.
> More specifically, the "bunch of bytes" is a skel.
>
> An alternative implementation to replace DB could simple provide an
> alternate way of storing data of identical complexity: Just keep
> track of keys and skels.

Yup. But if we're targeting a SQL database, the skel approach loses all of
SQL's query capabilities...

> Slightly more complex of course is the fact that there are five tables.
> But this is pretty simple as well.

Yah.

> So, ignoring transactions for a moment, I could implement a replacement
> for DB as follows:
>
> create five directories, one for each of the tables
>
> store every record as a file in the proper directory.
> the filename is the key, and the contents of the file are
> the skel
>
> everywhere I see a DB call I would replace it with a simple
> file IO call
>
> Like I said, I'm ignoring transactions for the moment, to see if I
> understand the data model. So now I'll ask: Am I missing something
> here?

Not at all. For example, we could use the apr_dbm functions in APR-UTIL
which would give us access to a number of other DBMs, such as gdbm.

> If the above is true, then the same thing can be done for a SQL
> backend:
>
> create five SQL tables
>
> each table will have two columns:
>
> a key
> a blob
>
> And this would work. However, it would not allow us to take
> advantage of the querying capabilities of a SQL db. We would be
> using a SQL db in exactly the same fashion as we use Berkeley db.

Right. The queries are the coolest thing... "what files were changed by
<this> author?"

> In fact, we could get pluggable DB-replacements by creating a very
> simple abstraction API. Something like the following would be
> approximately sufficient (hand-wave, hand-wave):
> create/delete a db
> begin/abort/commit txn
> put(table, key, skel)
> skel = get(table, key)

Anything skel-based isn't going to have the right kind of semantics. We need
to map things to specific columns, and the skels lose that semantic
information.

> Obviously, another way of doing this would be to replace all of the
> implementation in libsvn_fs. This might allow us to make better use
> of the storage facilities of the underlying DB. Instead of storing
> pairs of keys and skels, each key would correspond to a collection
> of columns. The individual atoms in the skels would be placed in
> individual columns. This would allow us to query against those
> columns, but that's about the only difference. ( I admit that it's
> a big difference. )

Skels aren't the right way to do this... instead, we would have db-specific
mappings from internal structures to the appropriate mechanism for the
database at hand.

Some databases may choose to marshal the structures as skels, but anything
with facilities richer than a plain key/value will (most likely) map the
structures into input/output binding buffers for the database.

Note that our memory consumption will be *much* better if we can avoid skels
and map directly to/from database binding buffers.

> It looks to me like some implementations of libsvn_fs will want
> to use skels. DB is obviously an example. I suspect that an
> implementation on plain text files would be another example. Why
> reinvent this particular wheel?

Yes, we could keep the skels around as a "db utility" for use by various FS
backends. No problem there.

> However, a SQL-based store is an example where it would *probably*
> make more sense to design a mapping which does not use skels.

A lot more than probably. It is *definitely* more sense to avoid skels.

>...
> Am I capturing the issues which prompted you to make the remark
> about having skels be DB-specific?

Yup :-) They just don't belong in other DBs.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Sat Oct 21 14:36:36 2006

This message: [ Message body ]
Next message: Branko Èibej: "Re: [PATCH] replacement for getdate.y"
Previous message: Greg Hudson: "Re: Candidate 'getdate' replacement: VERSION TWO."
In reply to: Eric W. Sink: "Are skels a db-specific thing?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]