[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: server-side log cache

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Sat, 08 Oct 2011 13:06:54 +0200

On 30.09.2011 18:19, Stefan Sperling wrote:
> On Thu, Sep 22, 2011 at 08:43:14PM +0200, Stefan Fuhrmann wrote:
>>>>> This looks very interesting.
>>>>>
>>>>> What about FSFS-specific requirements?
>>>> See assumptions above, this may require a different
>>>> data structure. But I think that noderev dependencies
>>>> will turn out to be redundant if you have a log cache
>>>> and access to the skip-delta forwards dependencies.
>>>>> It sounds like you avoid those by storing data in semantics of the repos
>>>>> layer (path_at_revision) instead of the FS layer (node-revision-id)?
>>>> Yes.
>>>>> In this case separate implementations for FSFS and BDB aren't needed.
>>>>> This could be an advantage (e.g. third party FS implementations
>>>>> wouldn't need to change to support this).
>>>> It also eliminates on of the performance weaknesses
>>>> of SVN today: A log on some old / seldom changed
>>>> path can take a very long time.
>>>>> I'll think about this some more, thanks.
>>>>>
>>>> Welcome ;)
>
> Reviving this thread.
>
> Your concerns about a node-rev based approach seem to resolve largely
> around performance, not about correctness.
Effectiveness, to be precise.
> I.e. you agree that a
> node-rev-based solution as currently being worked on within the
> fs-successor-ids branch will work correctly, but won't perform
> as well as your proposal for certain queries, right?
Since it does not add any information, the node-rev-based
approach will not *cause* incorrect behavior. In that sense,
I agree.

But I still fail to see how it will be effective except for a
very, very specific use-case. I probably just haven't understood
your use-case. Could you give a short description of the problem
that you are trying to solve and how the node-rev cache will help?
>
> Now, I don't feel comfortable trying to implement your design.
> The reason for this is that you could do a much better job at this yourself.
That's fine with me. I won't have the time to do that this year,
though.
>
> However, I do feel very comfortable continuing the work we've started on
> the fs-successor-ids branch.
Having that implementation available will certainly do no harm.
>
> I also think that the two approaches can complement each other.
> We are not in an either/or situation. We will get correct answers either
> way and the only real difference is performance.
>
> Note also that our plan for putting successor-IDs into the filesystem
> layer we will also solve the problem of creating the successor data
> during an upgrade from SVN 1.7 to 1.8.
> Both approaches need to solve this somehow, and we'd have that part
> sorted out for you.
I don't see a problem here. If necessary, we could extend the FS
layer API with version check methods etc.
>
> So, what about this: We implement successor-IDs in the filesystem
> as planned on the fs-successor-ids branch.
> Once we have that, and when you have time, you adapt your log cache
> proposal to create a runtime cache that sits on top of the new
> successor-ID filesystem data, and caches results for certain log queries
> in memory for quick access. It should even be possible to pre-populate
> this cache when the server start up.
How would I reconstruct copy target path names from node-rev info?
>
> This way, we have some amount of redundancy in the system, but Daniel
> and I can continue trying to deliver a working solution for 1.8 based
> on what we've started. And we can worry about performance issues later,
> because you already have a plan for that.
>
> Frankly, I think the time people are wasting today resolving trivial
> tree-conflicts is a huge waste of their time. No matter how bad the
> performance of an automated solution to this problem will be, it will
> be faster than a human being. Our users will get a huge benefit either
> way because we will be reducing their load of manual labour. Performance
> of the solution doesn't need to be perfect by the time we release 1.8
> and nothing stands in the way of improving performance later.
>
> Do you agree?
This depends entirely on your use-case (see above). My experience
with navigating these change graphs indicates that better-than
O(n^2) performance requires completely different algorithms, data
structures and API than a merely correct path-by-path approach.
BTW, n > 10.000.000 for certain repositories.
>
> If not, I hope that you'll find time to help us implement your pure
> caching solution for 1.8. I would really like to see some solution
> to this problem in the 1.8. release.
I'm currently working on other SVN-related projects.
 From April on, I'm available for hire.

-- Stefan^2.
Received on 2011-10-08 13:07:30 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.