[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

FSFS instance-id and on-disk representation

From: Julian Foad <julianfoad_at_apache.org>
Date: Tue, 14 Feb 2017 15:53:16 +0000

TL;DR: the repository "instance-id" introduced in FSFS f7 doesn't make
any difference to on-disk representation of FSFS; can we please affirm
that this will continue to be so.

== What is this instance-id? ==

(A brief summary for those who, like me, didn't know what this is.)

In Subversion 1.9 we introduced a repository instance-id in FSFS f7. It
is stored as a second line in the "db/uuid" file. The log message tries
to explain why:

   https://svn.apache.org/r1618138

Basically it is to disambiguate some potentially shared data in two
svn_fs_t objects opened to repositories that have the same (primary)
repository UUID. I am still not clear exactly what shared data it is
used for and among which processes that data can be shared.

The log message also mentions some scenarios where having different
instance-ids is important (if they have the same primary UUID). Three of
these that I would like to mention here are:

   * during "svnadmin hotcopy repo1 repo2"
   * during "svnadmin freeze repo1 (svnadmin freeze repo2 (...))"
   * serving repo1 and repo2 from the same Apache httpd instance
     (in some configurations)

The second email thread linked from that log message contains most of
the interesting discussion:

   http://svn.haxx.se/dev/archive-2014-08/0093.shtml

== Why it matters ==

WD's Svn Multisite Plus (MSP) replicates and synchronizes Subversion
repositories, using rsync initially, then through their own
synchronization software. Until now those replicas are bit-for-bit
identical, and consistency checking has included checking that
repositories remain bit-for-bit identical.

I'm aware that we don't guarantee a repository will be bitwise
predictable (and thus two instances remain bit-for-bit identical) when
written to. But it has been, under these conditions, and this has been
useful.

Replicas are generally kept on physically separate servers, and served
by separate Apache httpd instances. Two replicas are never accessed
together by the same process, in normal use. It is unlikely but
conceivable that an administrator might encounter one of the scenarios
where the instance-id matters.

WANdisco asked me to advise on what to do. It seems the correct thing to
do with "instance-id" is to make that field deliberately different on
each instance of the repository. In consequence the consistency check
will have to be made to ignore differences in the instance-id line in
the "db/uuid" file.

== Question ==

WANdisco would like to know that there will not be differences in the
repository on-disk data due to differences in instance-id (other than
the "db/uuid" itself, of course). I suggest we are talking about the
lifetime of FSFS format 7; of course the features of a future format are
unknown.

Can I tell them that that is the expectation, and we won't change that
situation without a good reason?

- Julian
Received on 2017-02-14 16:54:39 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.