[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

fs dump/restore proposal

From: Ben Collins-Sussman <sussman_at_collab.net>
Date: 2002-04-23 23:43:08 CEST

A proposal for an svn filesystem dump/restore format. Questions are
at the bottom.

Two problems we want to solve
=============================

 1. When we change our node-id schema, we need to migrate all of our
     data (by dumping and restoring).

 2. Serves as a backup format. Could be read by other software tools
     someday.

Design Goals
============

 A. Written as two new public functions in svn_fs.h. To be invoked
     by new 'svnadmin' subcommands.

 B. Format uses only timeless fs concepts.

     The dump format needs to reference concepts that we *know* are
     general enough to never change. These concepts must exist
     independently of any internal node-id schema, or any DB storage
     backend. In other words, we're talking about the basic ideas in
     our original "design spec" from May 2000.

Format Semantics
================

Here are the timeless semantics of our fs design -- the things that
would be stored in our dump format.

  - A filesystem is an array of trees.
    Each tree is called a "revision" and has unversioned properties attached.

  - A revision has a tree of "nodes" hanging off of it.
 
  - The majority of a tree's nodes are hard-links (references) to
    nodes that were created in earlier trees.

  - A node contains

        - versioned text
        - versioned properties
        - predecessor history: "which node am I a variant of?"
        - copy history: "which node am I a copy of?"

    The history values can be non-existent (meaning the node is
    completely new), or can have a value of {revision, path}.

Implementation (Questions)
==============

  * file format
   
    Although it's tempting to use XML (easy to output, easy to write a
    parser), gstein pointed out that it may create more problems in
    the long run. Storing binary data (and escaping) in XML can be
    painful; scanning for the escape characters can really slow down
    an import; just imagine trying to store an XML file! Even though
    XML may be more convenient at the outset, we'll probably end up
    burning lots of time trying to work around these other issues
    later on.

    For this reason, we're thinking some kind of simple binary format.

  * should we bother to implement 'diffy' storage of texts in our
    format? My instinct is "no". Dumping and restoring a filesystem
    is a rare operation, so we don't need to be so paranoid about disk
    space usage. It would be extra work to implement diffy-storage,
    and imports would probably be safer (and faster) if we had nothing
    but fulltexts in our dump.

  * Reading through our 'libsvn_fs/structure' document, it seems that
    the only data we're not saving in the dump is a node's "Created
    Revision" (CR). It's not clear to me that this is a timeless
    concept. It certainly has no relevance in the new, impending fs
    schema. It seems more like an optimization for our current
    node-id schema. Do others agree?

    Let me be clear here: the CR is still a useful concept. For
    example, I like very much that this value is cached in my working
    copy and shows up in 'svn status -v'. I will always want to know
    "in what revision did foo.c last change?" When we switch to the
    new schema, we'll still want to keep this concept around -- it
    will simply have a different implementation under the hood.

    But still, I don't think it needs to be saved in a dump format.
    When we re-import a filesystem, the information can be *derived*
    by whatever schema exists under the hood -- possibly on-the-fly,
    as we're importing. Am I making sense?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Apr 23 23:46:19 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.