[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RFC: Revision indexes for 1.1

From: Branko ─îibej <brane_at_xbc.nu>
Date: 2004-04-19 08:39:18 CEST

The idea of labels has been floating around the SVN lists for some time.
In this post I propose a generic mechanism that could be used to
implement one of the label flavours, but would be also useful for other
things.

1. The label proposals

There were several different proposals for label semantics, but they all
boil down to two types:

    * A "label" is a symbolic name for (a set of) revision(s)
    * A "label" is a symbolic collection of specific versions of
      specific files

The mechanism I propose here can be used to implement the first type of
label. The reason for this is simple: we already have way to uniquely
identify a set of specific versions of specific files; it's called a
branch or a tag. We can even create mixed-revision tags with a WC-to-WC
or WC-to-URL copy. Ease-of-use and other UI considerations aren't
central to this proposal, so I'll just note that all the mechanics
necessary for implementing label manipulations of the second kind in SVN
clients are already available in our libraries.

2. The svn_repos_dated_revision problem

Another issue that needs to be solved is the requirement that the values
svn:date revprops rise with increasing revision numbers. Apart from the
fact that we don't have any checks in the code that prevent changing the
commit date, this also limits the ways in which "svnadmin load" can be
used for combining repositories (e.g., for replication) or repository
conversion (e.g., cvs2svn).

3. Revision indexes

Last but not least, searching for particular values of revprops is very
slow. All of these problems can be solved by introducing a revision
property index. This index would map a (propname, value) pair to the
list of revision numbers where this key appears. For example, to find
all the commits by one kfogel, you could do a single search for
(svn:author, kfogel) instead of having to traverse all revisions and
read their revprops.

This indexing would happen automatically (probably implemented at the
repos layer, rather than the filesystem layer), and which props are
index sources would be controlled by a repository-specific configuration
file. Two parameters control the indexing behaviour for each
index-enabled property type:

    * Uniqueness: Controls whether a single (propname, value) pair can
      map to more than one revision number
    * Multiplicity: Controls whether, for the same propname, more than
      one (propname, value) pair can map to a single revision.

3.1 DB schema changes

The filesystem grows a new table, "revpropindex", with the following schema:

    (PROP-NAME PROP-VALUE) -> REVNUM

Non-unique indexes are allowed.

No other changes are necessary. For forward compatibility, servers that
do not implement revision indexes will ignore this table; for backward
compatibility, if the table does not exist in the repository, revision
indexing and search operations are disabled. The dumpfile format does
not change, as the contents of the revision index can be reconstructed
from revprop data.

3.2 Multiple indexes per revision

The values of properties that allow multiple keys per single revision
are represented in a newline-terminated list, one value per line (like
the svn:ignore property on directories). Each value is added as a
separate key to the index.

3.3 Indexing configuration

The repository grows a new configuration file, conf/revpropindex. The
format of the file is as follows:

    [propname]
    unique = [TRUE/false]
    multiple = [FALSE/true]

Revprops that do not appear in the config file are not indexed. The
default contents of this file are:

    [svn:date]
    unique = false
    multiple = false
    [svn:name]
    unique = false
    multiple = true

The svn:date and svn:name indexing cannot be turned off, neither can the
indexing parameters change (in effect, we may as well not actually
enable these in the revpropindex config file).

3.4 FS/Repos API changes

When opening an existing repository, the FS layer must not error out if
the revpropindex table does not exist.

The repos layer grows a new function,

    svn_repos_revision_search(propname, propvalue)

which returns a list of revision numbers. The list can be empty. No
error is returned if a property is not indexed or revision indexing is
not enabled in the repository (i.e., if the repository schema version is
older than the server version).

The propset, propchange and propdel repos-level wrappers must maintain
the revpropindex table (optimization hint: when changing multi-value
properties, only values deleted from or added to the list need to be
processed).

The function svn_repos_dated_revision changes: first, it calls
svn_repos_revision_search("svn:date", timestamp). If this returns a
non-empty list, it returns the oldest revision from this list. Otherwise
it performs the current binary search. (The binary-search implementation
must stay for backward compatibility. It can be removed in 2.0.)
svn_repos_committed_info and svn_repos_history get similar changes.

4. Implementing revision names

Using the mechanism described above, we can add symbolic names to a
revision or a set of revisions. To do this we introduce a new revision
property, "svn:name", that contains a newline-separated list of symbolic
names assigned to a revision. The values are non-unique: that is, a
single symbolic name can group several distinct revisions.

While the existing "prop(get|set|edit) --revprop" functionality is
sufficient for setting and maintaining revision names, it is not really
useful. I propose the following changes to the UI:

4.1 Extend the format of the "-r" command-line option

Currently the -r command-line option accepts a revision number or a date
(range):

    -r revnum|{date}[:revnum|{date}]

The {date} specifier is internally converted to a revision number. We
add another specifier, [labelname], that is also converted to a revision
number.

Note: Since label values are non-unique, a [label] specifier can refer
to a list of revision numbers. Such lists useless for "svn update" or
"svn export"; however, "svn merge" could be extended to handle
multi-revision merges (cherry-picking, right?). We should support an
analogous format, "-r revnum,revnum,..." for specifying an explicit list
of revision numbers; this is also needed for defining multi-revision labels.

4.2 svn label [-r revnum/range/list] label-name

Adds a label to the specified revision(s). All forms of the -r option
are supported (including label specifiers, of course). The default is to
label HEAD.

4.3 svn labeldel [-r revnum/range/list] label-name

Remove a label from the specified revision(s). If -r is not specified,
remove all instances of the label.

All these functions need equivalents in the client library; the RA layer
only has to expose svn_repos_revision_search. "svn label: and "svn
labeldel" can be implemented as simple revprop manipulations, although
implementing them on the server would make multi-revision labeling faster.

5. Future notes

Currently no history is recorded about revprop changes. This is an
oversight that makes Subversion behave slightly at cross-purposes with
configuration management philosophy. Unfortunately, in order to record
historical changes to revprops, a slightly more drastic change is needed
not just to the schema and API, because these changes would have to be
recorded in a new kind of transaction. Thus this kind of history
tracking cannot be implemented before 2.0.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Apr 19 08:39:56 2004

This is an archived mail posted to the Subversion Dev mailing list.