[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Performance branch changes merged to /trunk

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Mon, 3 Aug 2009 13:05:06 +0200

Hans-Emil Skogh <Hans-Emil.Skogh_at_tritech.se> wrote:

> > Most of my efforts went into my change flow (merges and more)
> > reconstruction project (TortoiseAnalyze) and I'm about
> > half-way there.
>
> Is there someplace that you can read more about TortoiseAnalyze, or
could you elaborate a little?
> What is its purpose, how will it work?

Since this is the T*SVN* list, I'm keeping this short ;)
And no, there is no TAnalyze list, because there wouldn't
be much to talk about, yet.

General vision:
        * Provide enough information for "intentional"
          merge tacking. Typical question is "I fixed this
          code sequence - where should I apply the same patch?"
        * No need to be 100% accurate. This is just a utility.

Basic problem:
        * Which changes got duplicated where?
        * What is the general relationship between 2 nodes
          (copied & renamed, split, one-way merge only, ...),
          i.e. to what nodes where should I merge my changes?

Idea:
        * Index all node changes in a repository
        * Find (partial) matches between these changes
          (i.e. a global "copy-n-paste" finder)
        * Encode / store this information in some space-
          efficient way.
        * Identify "original" changes and their "duplicates"
        * Infer node relationships.
        * Reconstruct merge info like "this is a tainted merge
          of r10-15 from /xyz to /trunk"

Challenge:
        * Time-efficient algorithms for duplicate search (mostly done)
        * Keep memory consumption as low as possible without
          hurting performance (KDE diff'ed change data is still 2G lines)
        * Extract precise information despite heuristics (99+% correct)
        * Present it in some useful way (don't have that static code
          analysis problem where there is about 1 issue one every line)

Status:
        * Learned a lot about the properties of typical code repos
        * So far, algorithms are very efficient and scale very well
          (AsyncFramework is a by-product of that w/o the issues of
           OpenMP)
        * Data import is done, efficient duplicate representation
          is still ~50% open.
        * Constructing results is yet to do

As there are a couple of things I would like to see in
TSVN 1.7, so I switch projects for now. TAnalyze is just
a toy project of mine, so far and I would like to see
it doing something useful by the end of this year.
 
-- Stefan^2.

------------------------------------------------------
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessageId=2379488

To unsubscribe from this discussion, e-mail: [dev-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2009-08-03 13:05:21 CEST

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.