Re: server-side log cache

From: Stefan Sperling <stsp_at_elego.de>
Date: Fri, 30 Sep 2011 18:19:53 +0200

On Thu, Sep 22, 2011 at 08:43:14PM +0200, Stefan Fuhrmann wrote:
> >>>This looks very interesting.
> >>>
> >>>What about FSFS-specific requirements?
> >>See assumptions above, this may require a different
> >>data structure. But I think that noderev dependencies
> >>will turn out to be redundant if you have a log cache
> >>and access to the skip-delta forwards dependencies.
> >>>It sounds like you avoid those by storing data in semantics of the repos
> >>>layer (path_at_revision) instead of the FS layer (node-revision-id)?
> >>Yes.
> >>>In this case separate implementations for FSFS and BDB aren't needed.
> >>>This could be an advantage (e.g. third party FS implementations
> >>>wouldn't need to change to support this).
> >>It also eliminates on of the performance weaknesses
> >>of SVN today: A log on some old / seldom changed
> >>path can take a very long time.
> >>>I'll think about this some more, thanks.
> >>>
> >>Welcome ;)

Reviving this thread.

Your concerns about a node-rev based approach seem to resolve largely
around performance, not about correctness. I.e. you agree that a
node-rev-based solution as currently being worked on within the
fs-successor-ids branch will work correctly, but won't perform
as well as your proposal for certain queries, right?

Now, I don't feel comfortable trying to implement your design.
The reason for this is that you could do a much better job at this yourself.
However, I do feel very comfortable continuing the work we've started on
the fs-successor-ids branch.

I also think that the two approaches can complement each other.
We are not in an either/or situation. We will get correct answers either
way and the only real difference is performance.

Note also that our plan for putting successor-IDs into the filesystem
layer we will also solve the problem of creating the successor data
during an upgrade from SVN 1.7 to 1.8.
Both approaches need to solve this somehow, and we'd have that part
sorted out for you.

So, what about this: We implement successor-IDs in the filesystem
as planned on the fs-successor-ids branch.
Once we have that, and when you have time, you adapt your log cache
proposal to create a runtime cache that sits on top of the new
successor-ID filesystem data, and caches results for certain log queries
in memory for quick access. It should even be possible to pre-populate
this cache when the server start up.

This way, we have some amount of redundancy in the system, but Daniel
and I can continue trying to deliver a working solution for 1.8 based
on what we've started. And we can worry about performance issues later,
because you already have a plan for that.

Frankly, I think the time people are wasting today resolving trivial
tree-conflicts is a huge waste of their time. No matter how bad the
performance of an automated solution to this problem will be, it will
be faster than a human being. Our users will get a huge benefit either
way because we will be reducing their load of manual labour. Performance
of the solution doesn't need to be perfect by the time we release 1.8
and nothing stands in the way of improving performance later.

Do you agree?

If not, I hope that you'll find time to help us implement your pure
caching solution for 1.8. I would really like to see some solution
to this problem in the 1.8. release.
Received on 2011-09-30 18:20:45 CEST

This message: [ Message body ]
Next message: Paul Burba: "Re: Merge info display"
Previous message: Philip Martin: "Re: Incomplete working nodes"
In reply to: Stefan Fuhrmann: "Re: server-side log cache"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]