Re: [PATCH] Introduce per-instance filesystem UUIDs

From: Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
Date: Sun, 10 Aug 2014 17:56:43 +0400

Hi Stefan,

> Thanks for having a the (missing) instance ID issue.
> From initial review, I have 1 objection and 2 issues
> that your patch does not address, yet.

Thank you for reviewing this patch. I did my best, but I cannot really tell,
which of these points are objections, and which of them are issues with the
proposed patch.

This patch solves a well-defined problem that was particularly stated in
the comments prior to r1589262:
[[[
  Using the uuid to obtain the lock creates a corner case if a
  caller uses svn_fs_set_uuid on the repository in a process where
  other threads might be using the same repository through another
  FS object. The only real-world consumer of svn_fs_set_uuid is
  "svnadmin load", so this is a low-priority problem, and we don't
  know of a better way of associating such data with the
  repository.
]]]

Just to mention, this comment was not restored in r1589518, although the
changeset was claimed to be a "revert". And (just to mention #2) without a
change like this we *do* have a regression in the incremental hotcopy behavior
compared to 1.8, because now the destination is allowed to be packed while
hotcopy is in progress, and that's where real "ripple effects" become visible.

> * Ideally, we would store the instance UUID as a separate
> file next to uuid and friends. Presumably trying to not
> increase the fs_open latency low, you merged that info
> into the format file - causing various ripple effects.
> I think it should be put into the UUID file (or a separate
> file altogether) and update places where we rewrite
> this file (e.g. svnadmin load).

Honestly, I do not get this â€” what exactly are the "ripple effects" you are
talking about? Don't you think that adding another file would not have its own
"ripple effects", at least considering the amount of places where you have to
change the code? And again, I still do not see any real problems with the
approach I chose.

From my point of view, it is not even a matter of taste. We require the
'instance-uuid' option as a solution to our *internal* problem with the shared
data clashes (and, possibly, with outdated cache contents after the repository
replacement). Putting a thing like this into the 'db/format' file has at least
a couple advantages:

- You do not have to open yet-another-file within fs_open() calls. These are
  frequent â€” think about how often Apache HTTP Server does it. So, why don't
  we entirely avoid this overhead, considering the fact that it does not make
  any real difference?

- This actually makes sense. We have the filesystem UUID stored in a separate
  'db/uuid' file and it can be changed by the end user (svnadmin setuuid). We
  also have the instance UUID, which is invisible for the user and solves *our*
  problems. Whenever we use it appropriately in our caches / shared data, the
  end user should not *ever* want to change the instance UUID, and, to my mind,
  we should not introduce another user-visible thing and a corresponding set of
  commands for the administrator to remember.

Finally, I do not really understand the 'svnadmin load' point. You do not have
to somehow interact with an instance UUID whenever you load something into a
repository, do you?

> * The instance ID must become part of the cache key.
> This is for the hot-restore-from-backup use-case where
> a repository gets replaced with an older version of itself
> while the server process is kept alive (caches are still hot).

This is irrelevant to this patch. As a matter of fact, this patch also does not
solve the move tracking problem. Using this new UUID in our caches seems like a
logical next step and I actually had that in mind for implementing. However,
we can solve all these problems one-by-one, considering the fact that this patch
is an atomic change solving *one* exact problem.

> * svnadmin should have a means to bump the instance ID.
> Again, the restore-from-backup use-case.

See above. Again, I cannot tell whether this is an issue or an objection, but
"should" looks unjustified. As I understand, you are talking about a triple
edge case:

1. An administrator backups the repository by copying them (making disk images,
etc.) instead of using the 'hotcopy' command we provide.

2. Something goes wrong, and he/she restores the backup by hot swapping the
repository *without* restarting the server.

3. Things start to go wrong with the cached data left from the previous
repository before the swap happened.

There are two ways of solving this â€” you can restart the server (which is a
better choice) or give the repository a brand new UUID with 'svnadmin setuuid'
prior to the swap. Hence, I do not see a necessity in another command that
would allow us to change the instance UUIDs. Administrators would have to
learn about the difference between filesystem and instance UUIDs, which is
pretty much unnecessary, because they *do not* have to know or use these
knowledge in order to solve the problems with UUID clashes; they already
have a way of fixing them.

Regards,
Evgeny Kotkov
Received on 2014-08-10 15:57:31 CEST

This message: [ Message body ]
Next message: Ben Reser: "Re: 1.7.18 up for testing/signing"
Previous message: Ivan Zhakov: "Re: [VOTE] Merge svn-auth-x509 branch to trunk?"
In reply to: Stefan Fuhrmann: "Re: [PATCH] Introduce per-instance filesystem UUIDs"
Next in thread: Stefan Fuhrmann: "Re: [PATCH] Introduce per-instance filesystem UUIDs"
Reply: Stefan Fuhrmann: "Re: [PATCH] Introduce per-instance filesystem UUIDs"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]