"Alexander Klenin" <klenin@gmail.com> wrote on 19.10.2007 04:44:16:
> I see statistics graph receive a lot of attention lately.
> If Andreas plans on further improvements, perhaps it makes sense to
> create a separate TortoiseStats application in line with TortoiseMerge
> and others?
The current activities were focussed on fixing edge cases as well
scalability and usability issues. I aggree with the other posts that
in its current shape, the statistics part does not warrent a
separate application.
> I think that with carefully designed API we could allow to use
> Excel/OpenOffice Calc/GNU Plot instead of it, by analogy with external
> merge tools.
Interoperability, where practicable, is almost always a plus.
While Simon's feature creep argument is certainly valid w.r.t. to
"simple" extensions of the current statistics features, I do strongly
disaggree with it in a broader sence.
Here some random thoughts I had about that topic this weekend.
Just to write them down, if we should run out of work to do ;)
* The log contains valuable information that is hard to harvest
without specialized tools (incomplete list):
- stability of your code
* code churn
* bug fixes
* find "rushed" code (2nd commit soon after)
- conflict frequency / density (merge after change to same file)
* same user (change and merge)
* different user
- hot spots
* in space
* in time
- software and process degradation (i.e. slow change over time)
* comment size and variability
* commit size
* orphaned code
* branching and tagging behavior
* With that data you can do (incomplete list):
- statistics (make predictions based on historic data and
current measurement)
* compare against averages
* correlation of (user defined) indicators
* calculate confidence intervals
* test goodness of fit
(do my indicators behave as I assume)
- identify possible modules and interfaces
* by author(s) (many ./. few autors)
* change dependency
- allows even for bug prediction (review!)
* stable code ./. lots of code churn
* find all code in the "vincity"
(time, space, change, user) of known problems
* suspecious code that has not been tested
(significantly lower code churn than in the
"neighbourhood")
- indefinite number of soft indicators that
"something" needs to be taken care of
- creates extensive knowledge about your own
software development process
* Intended users:
- fewer users than "merge meisters"
- SW developement process Gurus
* are also the opinion leaders
* their problems / questions are hard to predict
-> flexible framework
- pre-defined queries / reports for "occational users"
* "Solution":
- TortoiseAnalyze
* TSVN 1.7 (?)
- SQL-like internal (very specialized) query engine
* based on log cache
* SQL itself is weak in tree handling
* CSV (or similar) export useful
- interactive (specialized) query builder
* "simple" dialog box with drop-down style controls
- combine queries at will
* memory restrictions
* performance: may be almost interactive
* naive implementation may work quite well
* "joins" / correlations may need sophisticated code
(memory and performance)
* combine data from different repositories
- visualization though 3rd party code
* colored graphs
* weighted / hyperbolic trees
* color tables
* n-dimensional plots
* EXCEL interface (?)
* SSPS interface (?)
- results may be published (?)
* Problems
- legal restrictions (e.g. Germany)
- limited to file granularity
- interpretation of results requires experience
(just another form of / dimension to static code analysis)
I would love to do it ;)
-- Stefan^2.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Mon Oct 22 04:21:28 2007