[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Statistics graph as a separate application

From: <Stefan.Fuhrmann_at_etas.de>
Date: 2007-10-22 04:21:20 CEST

"Alexander Klenin" <klenin@gmail.com> wrote on 19.10.2007 04:44:16:

> I see statistics graph receive a lot of attention lately.
> If Andreas plans on further improvements, perhaps it makes sense to
> create a separate TortoiseStats application in line with TortoiseMerge
> and others?

The current activities were focussed on fixing edge cases as well
scalability and usability issues. I aggree with the other posts that
in its current shape, the statistics part does not warrent a
separate application.

> I think that with carefully designed API we could allow to use
> Excel/OpenOffice Calc/GNU Plot instead of it, by analogy with external
> merge tools.

Interoperability, where practicable, is almost always a plus.

While Simon's feature creep argument is certainly valid w.r.t. to
"simple" extensions of the current statistics features, I do strongly
disaggree with it in a broader sence.

Here some random thoughts I had about that topic this weekend.
Just to write them down, if we should run out of work to do ;)

* The log contains valuable information that is hard to harvest
  without specialized tools (incomplete list):

        - stability of your code
                * code churn
                * bug fixes
                * find "rushed" code (2nd commit soon after)
        - conflict frequency / density (merge after change to same file)
                * same user (change and merge)
                * different user
        - hot spots
                * in space
                * in time
        - software and process degradation (i.e. slow change over time)
                * comment size and variability
                * commit size
                * orphaned code
                * branching and tagging behavior

* With that data you can do (incomplete list):

        - statistics (make predictions based on historic data and
          current measurement)
                * compare against averages
                * correlation of (user defined) indicators
                * calculate confidence intervals
                * test goodness of fit
                  (do my indicators behave as I assume)

        - identify possible modules and interfaces
                * by author(s) (many ./. few autors)
                * change dependency

        - allows even for bug prediction (review!)
                * stable code ./. lots of code churn
                * find all code in the "vincity"
                  (time, space, change, user) of known problems
                * suspecious code that has not been tested
                  (significantly lower code churn than in the
                  "neighbourhood")

        - indefinite number of soft indicators that
          "something" needs to be taken care of

        - creates extensive knowledge about your own
          software development process

* Intended users:

        - fewer users than "merge meisters"
        - SW developement process Gurus
                * are also the opinion leaders
                * their problems / questions are hard to predict
                  -> flexible framework
        - pre-defined queries / reports for "occational users"

* "Solution":

        - TortoiseAnalyze
                * TSVN 1.7 (?)
        - SQL-like internal (very specialized) query engine
                * based on log cache
                * SQL itself is weak in tree handling
                * CSV (or similar) export useful
        - interactive (specialized) query builder
                * "simple" dialog box with drop-down style controls
        - combine queries at will
                * memory restrictions
                * performance: may be almost interactive
                * naive implementation may work quite well
                * "joins" / correlations may need sophisticated code
                  (memory and performance)
                * combine data from different repositories
        - visualization though 3rd party code
                * colored graphs
                * weighted / hyperbolic trees
                * color tables
                * n-dimensional plots
                * EXCEL interface (?)
                * SSPS interface (?)
        - results may be published (?)

* Problems

        - legal restrictions (e.g. Germany)
        - limited to file granularity
        - interpretation of results requires experience
          (just another form of / dimension to static code analysis)

I would love to do it ;)

-- Stefan^2.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Mon Oct 22 04:21:28 2007

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.