Re: log --limit not as good as it should be?

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: 2007-05-09 18:12:01 CEST

On 5/9/07, C. Michael Pilato <cmpilato@collab.net> wrote:
> Daniel Berlin wrote:
> > (This is running against a 1.5.0 dev build server, so the server
> > definitely supports the limit stuff. The same behavior is also found
> > running against http)
> >
> > svn log -r 1:400 --limit 400 svn://gcc.gnu.org/svn/gcc/trunk
> >
> > This will respond immediately and produce 400 revisions of log output
> >
> > svn log -r 1:HEAD --limit 400 svn://gcc.gnu.org/svn/gcc/trunk
> >
> > This will take about 3-4 minutes before starting to respond, and
> > produce the same 400 revisions of log output.
> >
> > It looks like something is touching all the revisions that *might* be
> > logged before we start producing log output at all.
>
> Yes. We can only trace history backwards, so anytime you run a log request
> with oldest-to-youngest direction in your range, the revisions have to be
> determined up front, then reported in reverse.

Well, they don't, actually.

To get the history from 1 to 124000, You can do the history from 10000
to 1, reverse and send, then 20000 to 10000, reverse and send, etc.

This makes no sense when you don't have a limit, but when you have a
limit that is relatively small compared to the greatest revnum you are
asking about, this will win.
>
> Your first command gathers all the changed revs between 400 and the revision
> in which the path came into being, then reports the first 400 of them in
> reverse.
>
> You second command gathers all the changed revs between HEAD and the
> revision in which the path came into being, then ports the first 400 of them
> in reverse.
>
>
Yes we discovered this tracing through code on IRC.
Doing O(revisions that changed between firstrev and lastrev) work up
front, instead of incrementally reversing in batches of say 10000, is
the wrong tradeoff to make.

The first wll end up with less work if you log with limit some random
obscure directory, but for the common case, where a *lot* of revisions
have changed on the path, the limit is what matters.

If the user gives a low limit relative to the total number of
revisions (say < log_end_rev / 1 0), we should be doing the reversing
in batches.

This will be O(greatest_rev / batch size) work worst case, but it
won't take 5 minutes to respond, *and* it's very lkely we will hit the
limit before we hit the end anyway.

5 minutes is not a guess here. That's really how long it takes to
discover the path changed 90000 times, then start to log 400 of them.

Yeah, it may slow down if you log obscure directories with almost no
changes. But i don't believe this is the common case.

When the path has changed a lot, you end up doing less work by doing
it incrementally when their is a limit.

--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed May 9 18:12:09 2007

This message: [ Message body ]
Next message: David Glasser: "Re: Shortcut for '--merge-sensitive'"
Previous message: Ben Collins-Sussman: "Re: Shortcut for '--merge-sensitive'"
In reply to: C. Michael Pilato: "Re: log --limit not as good as it should be?"
Next in thread: David James: "Re: log --limit not as good as it should be?"
Reply: David James: "Re: log --limit not as good as it should be?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]