[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Log filter improvements for 1.7

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Mon, 17 Aug 2009 01:04:25 +0200

Hi Stefan,

so these are my ideas for how to speed up
the log dialog.

First of all, I want to say that the dialog
has already become quite responsive up to at
least 10k revisions. The actual number seems
to vary accross different machines. The most
time is spent expanding the data efficiently
encoded in log cache to ordinary strings.

Therefore, fundamental changes to the log
dialog data model are required. They can
be carried out on trunk without disturbing
the dialog's functionality, though.

I'm not sure how well this will work or what
changes to The Plan might be necessary. But
those issues will show up and can be discussed
as we go.

-- Stefan^2.

Step 1: Switch from plain strings to log
        cache data model

(a) If we get data directly from SVN
    (merge info or log cache is off),
    feed it into a temporary cache object.

    Keep the current data model as it is,
    duplicating the data for the time being.
    This has already been implemented for
    the revision graph. Please note that
    the log cache is only used for storage
    not for mimicking 'svn log'. All problems
    with the caching since its introduction
    were with the second.

(b) Introduce the future main data index:
    A plain list of <revision, level> pairs.

    Initially, the data has to come from
    the current data model. But once the
    latter has been removed, the pair list
    is the only thing that needs to be recorded
    outside the log cache.

    The current list view content uses a
    filtered copy of said list.

(c) Incrementally (column by column) replace
    access to the old DM with access to the
    log cache DM.

(d) Drop the remnants of the old DM.
    Introduce ILogReceiver2 that only reports
    revision numbers.

    At that stage, startup should be so fast
    that the progress bar becomes useless
    while receiving for cached data (~10Mrevs/s).

Step 2: Index-based filtering

(a) Provide a class that maps and index_t to
    {match, no_match, untested}.

    Every item in the log cache is identified
    by an index_t value. Different authors,
    paths etc. are stored only once and need
    to evaluated only once.

(b) Use instances of that class when filtering
    for the author column, actions and the paths.
    The latter speeds things up considerably.

    We should get a factor of 2 out of this.

Step 3: Further filter speedup

(a) Use different filter classes
    - plain sub-string
    - wildcard
    - regex
    implementing a common IFilter interface.

    A factory class decides what filter class
    would be most efficient for the filter
    string (plain sub-string will be sufficient
    in most cases).

(b) Create a filter instance per column, start
    them in parallel and combine the results

Step 4: Multi-filter

(a) The combined filter result for one filter
    applied to all columns (3.b) shall be a
    mapping: <rev in log> -> {true, false}

(b) Introduce a class that can combine multiple
    such mappings left to right. The method
    signature might be something like

    Add (MapVector rhs, op, bool negate)

    Supported ops are union (+), intersection
    (default), difference / removal (-) and
    symmetric difference (^). Before the
    results are begin combined, optional negation
    is possible.

(c) Parse the filter spec and create the
    individual filter instances in parallel,
    and combine the results.

    Filters are separated by spaces (honoring
    regex parenthesis etc.). The operation
    used to combine the results is the first
    char of filter spec.
    Escapement via '\' is supported.


    .ppt +.xls TSVN -2009

    -> (((matches(.ppt) or matches(.xls))
            and matches(TSVN))
                excluding matches(2009))
       "all ppt or xls files about TSVN
        but not of this year"

    !me +!you \!

    -> ((not matches(me) or not matches(you))
           and matches(!))
       "everything containing an exclamation
        mark and not being about 'me' and 'you'
        at the same time"

    This should allow for sufficiently complex
    queries without complicating the implementation.
    The main point is, however, we can run all
    filters at once.


To unsubscribe from this discussion, e-mail: [dev-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2009-08-17 01:04:41 CEST

This is an archived mail posted to the TortoiseSVN Dev mailing list.