Re: FSFS: Plan of attack

From: Glenn A. Thompson <gthompson_at_cdr.net>
Date: 2004-04-13 19:31:13 CEST

[...]

>I often had trouble telling when it was talking about the API-level
>vtable or about the FSP-level vtable within the baseline FSAP, and there
>were a lot of personal comments which tended to draw my attention
>astray.
>
I can see both of these points. I'll take another stab at clarifying
some sections. I wasn't happy with my discussion of the construction
and initialization of the three primary objects. It's crucial to
understanding the flexibility I'm trying to provide.

Just to clarify a few things:
1. File System providers (FSPs) should always be concrete. A FSP
writer may want a vtable for certain virtual methods, but they should
attempt to push abstractions into a File System Abstract Provider
(FSAP). Any virtual methods that FSPs introduce should be resolved
during initialization.

2. FSPs can be a Hybrid multi-level implementation. A FSP writer may
choose to override methods at the "Big Three" API level on down. I'm
trying to encourage overriding of methods at an appropriate level within
the FS layer. I'd like to get rid of the layer (1), (2), (3) mindset
and instead think of it like so: The "XYZ" FSP is a concrete
implementation of the "DEF" FSAP which extends the "ABC" FSAP. The XYZ
FSP may implement and/or override methods of all FSAPs in it's
ancestry. Certain methods could be made "final" by convention.

3. FSPs do not have to derive from a FSAP. They can be written to the
FS API level. At a minimum they have to use the subversion name
protection scheme and implement static vtable initializers and a few
simple constructors. They can be a flat as they wish.

4. The approach I'm proposing is a *little* like the IOC containers
that are gaining popularity in the Java community.

>
>At any rate, I feel I've absorbed it now. My comments:
>
> * Assuming we have an API-level vtable, the general method presented
>in the document (with separate vtables for FS objects, transaction
>objects, and root objects) seems fine.
>
Okay.

>
> * I'm having trouble fitting the FSP abstraction together with the
>"Don't fall in love with a physical schema" sentiment expressed in
><http://www.contactor.se/~dast/svn/archive-2004-03/1664.shtml>. The
>FSP-level abstraction allows some flexibility in the physical
>representation, but does assume a particular set of tables. (This is
>not an area I'm heavily invested in, since I'm not interested in working
>on alternate DB implementations, but I'm still curious.)
>
Yes. I can see where I left this too lean. The plan was to use the
examples (which I never put to words) to make this more clear. I have
always felt there would be more than one SQL FS implementation. They
will most likely have a fair amount of overlap. This overlap will not
necessarily be along nice clean "level 1,2,3" boundaries.

I was trying to avoid any SQL bias in my document. But since you opened
the door. IMO, Collections (Directories and Properties) are the biggest
area of concern, and provide the most potential for divergence, in a SQL
implementation. In fact, I've already discussed changes to the "inner"
baseline FSAP vtable which should help in this area.

Putting aside how the tables would be keyed, a typical implementation
would have one member per row in some table. Another solution would be
to plunk a skel equivalent into a blob field. This would be badness in
SQL land as you lose selectivity (syntactically speaking). So using the
first method, how do we handle directory revisions? I believe Sander
was planning on replicating rows for every new revision.
Programmatically, this is certainly the most straight forward solution.
But consider the situation where at rev 10 we have a directory with 1000
entries. For rev 11 we add 1 new entry. Now we have 2001 rows for just
two revisions. This impacts performance in more than one way. Besides
the obvious rapid increase in table rows, it can also reduce the
selectivity (key uniqueness) of a table. Plus it generates a
considerable number of inserts. While most modern DBs can handle this
reasonably well. It's worth exploring alternatives.

An approach I want explore involves storing only the changes between
revs with complete representations sprinkled throughout revision
history. So rev 11 mentioned above would have a single "add x" row.
Seem familiar? This can be viewed as a form of "in DB" deltification.
It creates *ALL SORTS* of query challenges. I offer this as an extreme
example of "physical schema" variations. I'd rather not debate this at
this time. It's both risky and challenging. Frankly, I don't know if I
can make it work until I can make it work. Implementations like this
will make good use of the hierarchy mentioned in item 2 above.

>Also, the FSP
>abstraction does not appear to have pools in its vtable calls. That's
>consistent with the current FS code, but seems like a good thing to fix.
>
I'll take a look at this.

>
> * I'm a little concerned about the long-term implications of this
>paragraph:
>
> Roundtrips kill performance. [...] In a SQL DBMS every call to
> the DB creates considerable overhead. Iÿm not picking on those
> methods. Iÿm just pointing out that many things being done
> procedurally in the current FS will be faster if they can be
> done in a stored procedure or in ´mega¡ queries. Also, if a lot
> of interim data is being processed, even temp tables solutions
> beat application based procedural solutions in many cases.
>
>The implication here is that the ideal point of divergence for an SQL
>implementation might be at a *higher* level than the ideal point of
>divergence for libsvn_fs_fs; for while fs_fs wants to reuse all of the
>DAG logic in tree.c, an SQL implementation might want to use stored
>procedures or caching tables to minimize round trips.
>
>
Yes, I believe this is exactly gstein's concern. But again, I view this
more on a "per method" or "per FSAP" basis. Not "per FSP". This is why
I talk about reviewing the FS API. The idea is to nail down very
specific contracts for each and every method in the FS API. My "Nits"
section was an attempt to get that process started.

I hope this helps.

Thanks,
gat

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Apr 13 19:36:11 2004

This message: [ Message body ]
Next message: Ben Reser: "Re: [Issue 1807] ra_dav doesn't permit properties to have colons in their names."
Previous message: John Peacock: "Re: PROPOSAL: GPG Signing of Releases"
In reply to: Greg Hudson: "Re: FSFS: Plan of attack"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]