[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Introduce per-instance filesystem UUIDs

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Tue, 12 Aug 2014 23:29:58 +0200

On Sun, Aug 10, 2014 at 3:56 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Hi Stefan,
>
> > Thanks for having a the (missing) instance ID issue.
> > From initial review, I have 1 objection and 2 issues
> > that your patch does not address, yet.
>
> Thank you for reviewing this patch. I did my best, but I cannot really
> tell,
> which of these points are objections, and which of them are issues with the
> proposed patch.
>

No problem, I usually answer questions - eventually ;)
(It all depends on whether an the answers require
thought, what my TODO list or travel schedule is etc.)

> This patch solves a well-defined problem that was particularly stated in
> the comments prior to r1589262:
> [[[
> Using the uuid to obtain the lock creates a corner case if a
> caller uses svn_fs_set_uuid on the repository in a process where
> other threads might be using the same repository through another
> FS object. The only real-world consumer of svn_fs_set_uuid is
> "svnadmin load", so this is a low-priority problem, and we don't
> know of a better way of associating such data with the
> repository.
> ]]]
>

It's perfectly fine to fix that issue. Doing so introduces
a new feature ("instance ID") and I think the code should
cover related issues as well. It should take only a few
extra lines to fix the whole class of related problems.

Just to mention, this comment was not restored in r1589518, although the
> changeset was claimed to be a "revert".

Thanks for spotting it! r1589518 was not a strict revert.
That's how the old comment fell through the cracks.
Fixed in r1617586.

> And (just to mention #2) without a
> change like this we *do* have a regression in the incremental hotcopy
> behavior
> compared to 1.8, because now the destination is allowed to be packed while
> hotcopy is in progress, and that's where real "ripple effects" become
> visible.
>

That sounds like a separate issue. Hotcopy should take
out the pack lock on the target as well. Feel free to fix it.

> > * Ideally, we would store the instance UUID as a separate
> > file next to uuid and friends. Presumably trying to not
> > increase the fs_open latency low, you merged that info
> > into the format file - causing various ripple effects.
> > I think it should be put into the UUID file (or a separate
> > file altogether) and update places where we rewrite
> > this file (e.g. svnadmin load).
>
> Honestly, I do not get this — what exactly are the "ripple effects" you are
> talking about? Don't you think that adding another file would not have
> its own
> "ripple effects", at least considering the amount of places where you have
> to
> change the code? And again, I still do not see any real problems with the
> approach I chose.
>

The instance ID has nothing to do with format info. Agree?
Putting it there is just plain wrong.

Putting it into the same file means that you have to filter it
whenever you want to change or check one but not the other.
All of your changes in the fs-fs-pack-test were only needed
for that reason. But these "ripple effects" are only a symptom
of a bad design choice. Having e.g. a completely separate
file would not require any changes to the existing code to
maintain the current functionality.

> From my point of view, it is not even a matter of taste. We require the
> 'instance-uuid' option as a solution to our *internal* problem with the
> shared
> data clashes (and, possibly, with outdated cache contents after the
> repository
> replacement). Putting a thing like this into the 'db/format' file has at
> least
> a couple advantages:
>

> - You do not have to open yet-another-file within fs_open() calls. These
> are
> frequent — think about how often Apache HTTP Server does it. So, why
> don't
> we entirely avoid this overhead, considering the fact that it does not
> make
> any real difference?
>

Putting it into the existing uuid file would be just as efficient
and touch fewer code bits.

> - This actually makes sense.

No. It is does not tell us anything about the FS format.

> We have the filesystem UUID stored in a separate
> 'db/uuid' file and it can be changed by the end user (svnadmin setuuid).
> We
> also have the instance UUID, which is invisible for the user and solves
> *our*
> problems. Whenever we use it appropriately in our caches / shared data,
> the
> end user should not *ever* want to change the instance UUID, and, to my
> mind,
> we should not introduce another user-visible thing and a corresponding
> set of
> commands for the administrator to remember.
>

I'm not 100% sure whether we should expose the
instance ID to the user or not. The use case that
I am unsure about is backup (from hotcopy or not).

Technically, we all working copies become invalid
and 'svnadmin setuuid' makes that explicit. But there
are certainly users who would like to keep their
working copies intact. I honestly have no idea what
the correct approach here is.

> Finally, I do not really understand the 'svnadmin load' point. You do not
> have
> to somehow interact with an instance UUID whenever you load something into
> a
> repository, do you?
>

See above. As soon as you restore from backup,
any internal cache is (potentially) invalid now. An
ID bump (uuid or instance ID) is needed to prevent
disaster unless you restart your server processes.

> > * The instance ID must become part of the cache key.
> > This is for the hot-restore-from-backup use-case where
> > a repository gets replaced with an older version of itself
> > while the server process is kept alive (caches are still hot).
>
> This is irrelevant to this patch.

I disagree. Although you did not intend to address
the caching issue, it is basically the flip side of the
shared data aliasing problem that you try to solve.

> As a matter of fact, this patch also does not
> solve the move tracking problem.

How would this even be related to move tracking?
Do you refer to renaming repositories on disk?

> Using this new UUID in our caches seems like a
> logical next step and I actually had that in mind for implementing.

O.k. And that is what review is for: discussing the
consequences and potentially related issues. I.e.
sharing insight and ideas; it is *not* intended for blaming people
for anything.

> However,
> we can solve all these problems one-by-one, considering the fact that this
> patch
> is an atomic change solving *one* exact problem.
>
> > * svnadmin should have a means to bump the instance ID.
> > Again, the restore-from-backup use-case.
>
> See above. Again, I cannot tell whether this is an issue or an objection,
> but
> "should" looks unjustified. As I understand, you are talking about a
> triple
> edge case:
>
> 1. An administrator backups the repository by copying them (making disk
> images,
> etc.) instead of using the 'hotcopy' command we provide.
>
> 2. Something goes wrong, and he/she restores the backup by hot swapping the
> repository *without* restarting the server.
>
> 3. Things start to go wrong with the cached data left from the previous
> repository before the swap happened.
>
> There are two ways of solving this — you can restart the server (which is a
> better choice) or give the repository a brand new UUID with 'svnadmin
> setuuid'
> prior to the swap. Hence, I do not see a necessity in another command that
> would allow us to change the instance UUIDs. Administrators would have to
> learn about the difference between filesystem and instance UUIDs, which is
> pretty much unnecessary, because they *do not* have to know or use these
> knowledge in order to solve the problems with UUID clashes; they already
> have a way of fixing them.
>
>
> Regards,
> Evgeny Kotkov
>
Received on 2014-08-12 23:30:33 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.