[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Percent of authorship

From: Hans-Emil Skogh <Hans-Emil.Skogh_at_tritech.se>
Date: Mon, 19 Apr 2010 17:55:22 +0200

> I am the mentor of Dmitry Kravtsov and Oleg "Corwin" Pinchuk at the
> Far Eastern National University, Russia. The do their works for the TSVN
> as a courseworks.
> Unfortunately, they may sometimes have problems with English,
> so I'll try to help them answer some questions here.
 
Great!

>> How is the new "Percent of authorship"-metric calculated?
> In theory, it should be indeed lines-changes, but aggregated through
> the entire history of the file, with diminishing weight.
 
Ok. That is what I would expect. But right now it seems to be avaluating the number of commits, with weight diminishing over time. Or am I missing something?
Are you planning to actually use line-data? (If not: The "Commits by date" already gives a nice overview of authors activity over time.)

> Additionally, some kind of heuristic may be applied to reduce the weight
> of whitespace-only changes such as indentation fixes.
 
That would be good.

> Roughly speaking, this metric should answer the question
> "which person should I talk to if I want to understand/fix/improve
> this part of code".
 
Sounds like a good plan. But I don't think that you will be able to answer that question in a meaningful way by only looking at the number of commits. For example: Right now I'm listed with the highest "authorship" of a couple of huge files, only because I have contributed with numberous small fixes quite recently. The other authors working on this file that have provided few but much larger changes are therefor discriminated when one only looks at the commit count.

> I did not review the actual patch for algorithmic complexity, but it is
> quite possible that the low speed is just a result of some oversight and
> may be improved.

Oh, the current algorithm is fast enough. It's working with line data that would be slow. But you are not doing that (yet) as far as I can see.
 
> The ideas of statistics enchancements planned for Oleg's work are based
> on similar project for Git, although, of course, not all of them are
> applicable to SVN:
> http://repo.or.cz/w/git-stats.git/blob/HEAD:/doc/use-cases.txt
 
An intresting read. I think that the main difficulty in implementing something like this in TSVN (or SVN in general) is performance. Just look at how much time it takes for TortoiseBlame to start..!
 
A suggestion: Perhaps it would be better to add this percent authorship graph to TortoiseBlame? There you would have all the line-data you need to provide an intresting statistical analysis. This would then of course be limited to a single file, but much better than to be stuck with only the number of commits to work with.
 
Hans-Emil

------------------------------------------------------
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessageId=2589377

To unsubscribe from this discussion, e-mail: [dev-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2010-04-19 17:55:29 CEST

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.