[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: RFC: simple proposal for Internet-scoped IDs

From: Ben Reser <ben_at_reser.org>
Date: Tue, 4 Dec 2012 08:52:25 -0500

On Tue, Dec 4, 2012 at 3:39 AM, Eric S. Raymond <esr_at_thyrsus.com> wrote:
> Peter Samuelson <peter_at_p12n.org>:
>>
>> [Eric S. Raymond]
>> > 1. Add support to the client tools for shipping a FULLNAME field
>> > mined from somewhere under ~/.subversion. Maybe the existing
>> > username entry will do, maybe it won't - I see arguments both ways.
>> > I don't care, we can fill in that detail later.
>>
>> This part (upon which your whole proposal hinges) makes me scratch my
>> head a bit. Why should the client be involved at all?
>
> Because other ways of setting this bit of metadata have serious though
> un-obvious issues, and offer only partial coverage for particular special
> cases. This one mechanism would addresses *all* the deployment cases, and
> would do so in a way that properly empowers users, minimizes work for
> administrators, and avoids scaling problems.
>
> I will address each of these points. But first I want to clearly
> lay out the desirable qualities and consequences of fullname/email
> attribution cookies; they bear on the use cases I'm going to describe.
>
> 1. They have enough entropy that collisions aren't a practical problem.
> A human name alone does not. I'm excluding deliberate spoofing from
> the analysis because we now have enough experience with un-cryptosigned
> commits in DVCSes to say that this effectively never happens.
>
> 2. Because of (1), they allow repositories to be mobile as a
> disaster-recovery hedge. I've already explained this in detail and
> won't rehash it except to remind everyone that mobile != distributed -
> I'm not trying to morph Subversion into a DVCS.
>
> 3. They imply an email pointer back to a responsible person for each commit.
>
> 4. They function as Internet-wide primary keys for reputation systems.
> The type case I have in mind is Ohloh, which aggregates statistics
> from multiple repositories. It uses your fullname/email pair as a
> primary key to automatically identify you as a committer in multiple
> projects, which in turn feeds their "kudos" reputation system. High
> kudos identifies you as a good person to collaborate with. We'll
> undoubtedly see more of this sort of thing.
>
> Now, with these use cases in mind, let's consider different protocols
> that might allow users to set their attribution cookies.
>
> A. Users *can't* set them. Instead, we generate them - say, by consing up a
> generated email address on the host and the contents of the user's
> GECOS field.
>
> B. Site administrators can set them by editing something on which
> users don't normally have modification rights; an LDAP directory
> will stand as an example. This is one variant of your
> "account database" case.
>
> C. Users can change them through site-specific interfaces such as
> forge glue code, not needing to go through an administrator. This is
> another variant of your "account database" case.
>
> D. Users can set them through a preference in the Subversion client.
>
> OK, now let's matrix the modification scenarios with the use
> cases and see how they fare.
>
> Case 1 is equally good under A, B, C, and D. So is case 2.
>
> It's case 3 where we start to run into trouble. Suddenly A doesn't
> look so good. If the host referenced in a constructed cookie goes
> away, so almost certainly does any email pointer back to the person
> with that hostname wired in. This is the same disaster scenario for
> which we want painless repository mobility.
>
> If we want to preserve the case 3 property that attribution cookies are
> reliable pointers to people, we need a method that lets users set the
> email address component to something that can be expected to be valid
> on a longer timescale, like a personal domain (mine, admittedly an
> extreme case, has been valid since 1985).
>
> In theory, protocols B or C or D will do. But look at the difference:
> B or C imply a whole lot more work and a whole lot more places for the
> process to fail. Let's suppose that a user wants to point commits back
> to himself at a stable mail provider across multiple repositories.
> Now he has to (a) remember his backtrail, (b) navigate M different
> site-local interfaces, and/or (c) petition N sets of system
> administrators, none of whom will thank him for increasing their
> workload.
>
> Contrast protocol D: the user sets *one* preference in *one* place.
> He's done, nobody else had to do any work, and the change is
> guaranteed to be reflected in all his future commits. No scaling
> problem here.
>
> Now consider case 4. Sites like Ohloh increase the value of a
> *single* email address in all your attributions that is not only
> Internet-scoped but belongs to *you* and is lifetime-stable.

First of all the Ohloh problem has already been solved by Ohloh. You
can claim your commits.

However, even with your given proposal it doesn't solve the problem:

1) Some people may prefer not to use the same identity on different
projects, even open source projects. There are actually Open Source
projects I don't go by Ben Reser or breser or use an email address
that is derived from my name on. In fact some of those projects use
git.

2) If you allow an auto-setting of this identity to something based on
the GECOS fields you may end up with the same individuals having
different values. Especially if they use more than one machine or use
machines that they don't control the GECOS fields on. As soon as
someone changes machines you're back to the same problem of needing to
reconfigure the setting or you end up with multiple identities.
Additionally, some users will just never bother to setup the
configuration, I've already seen this on git projects where people
start pushing commits made with some default value that nobody knows
who it is.

3) You keep assuming that email addresses are immutably owned by
someone. That is fundamentally not true for the vast majority of
people and frankly is never absolutely not true. In the case of
people using email addresses at large email providers if they stop
using their account is almost guaranteed that if they have a decent
userid it will get reused later. In the case of people who own their
own domain names it's entirely possible that they could lose the
domain name (forget to pay it, etc...). As it stands I don't think
Ohloh assumes that breser in this project is breser in that project.
You have to claim those commits.

Interestingly enough that action alone allows the breser to be locally
scoped without running into the above possible problems, yet can still
be collapsed by simply claiming your userid on the various projects
you work on.

As it stands the entire functioning of your proposed solution here is
that people always remember to configure their unique id. I don't see
that as particularly easier than what the situation is now where you
claim your commits. Any argument that people might claim others
commits I consider as unlikely as people setting their id to that of
other people. Frankly, I don't think the vast majority of the user
base cares about this problem, which gives them little incentive to
care about setting this configuration setting.

You have other reasons to desire this, but I think all of those are
really resolved with a per-project authentication database.

I agree with Brane though that I really don't see a problem with
auto-revprops or a defined rev property name for this use. But I also
think that most people just won't use this.
Received on 2012-12-04 14:53:16 CET

This is an archived mail posted to the Subversion Dev mailing list.