[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Making FS and repos layer log API streamy

From: Stefan Fuhrmann <stefan2_at_apache.org>
Date: Sun, 14 Feb 2016 18:19:56 +0100

On 03.02.2016 13:36, Ivan Zhakov wrote:
> On 31 January 2016 at 14:03, Stefan Fuhrmann <stefan2_at_apache.org> wrote:
>> Hi there,
>> When the server needs to transmit the list of changed paths
>> in a revision, its memory usage is O(#changes), i.e. practically
>> unbound. The problems are:
>> * FS and repos API require a full collection of all changes
>> Most consumers simply scan that data once. So, they can
>> just as well work on a one change at a time basis as a
>> callback.
>> * The data types are different, so we end up with two copies.
>> A revised svn_fs_path_change_t with identical
>> svn_repo_path_change3_t can fix this. Minor adjustments
>> to the element data types further improve efficiency.
>> I've played with a revised version of svn_fs_paths_changed
>> and svn_repos_get_logs to use callbacks for the individual
>> path changes. Effectively, the FS layer can now pump the
>> info through the repos / authz filter into the RA layer.
> Here is some feedback.
> Making FS API streamy is a good goal. But as far I remember repos
> layer still has to store all changed path in hashtable for
> authorization purposes.

No, it does not. It filters path by path
(simply dropping the changes that the user
shall not see) and collects two global flags
indicating whether paths had to be hidden
for the current revision.

> Also I think we should not use callbacks to deliver data from the FS
> layer. Currently FS API is passive and I think it should remain the
> same: FS API users may invoke FS function from callback and this will
> require FS implementation to be fully reentrant (to avoid problems
> like fixed in r1718167 [1])

Good point. Support for reentrace is a must.
It can be achieved with iterators and callbacks

> Instead of callbacks we should create svn_fs_changed_paths_t and some
> kind of iterator of it. As a nice benefit FS API user may request only
> interesting information by providing NULL for some arguments (like

I will change the FS layer code to an iterator
because that makes the API easier to use with
our current repos layer code. OTOH, the code
on the FS side gets a little more convoluted.

However, to ensure proper authz filtering, the
repos layer needs to present the changed paths
list through a callback. It enables the repos
layer to guarantee that the changes get processed
before the revprops.

> I also noticed that svn_fs_nodeid_t member is removed from svn_fs_path_change3_t
> in r1727822. Are you sure about this change?

Absolutely. The nodeid is pointless and a bad
API in general.

> Some FSFS bugs are not a
> reason to drop this kind of information.

No FSFS user was ever able to use that information
outside a transaction. Not providing it won't
break peoples application logic.

> We should fix nodeid in
> changed path and maybe eventually replace it with svn_fs_node_t.

We cannot deliver information that we don't have.
All existing FSFS repositories are "corrupted" in
that sense. So, we will need to query that data
on demand.

With svn_fs_node_t in place, optionally providing
the affected node is certainly possible. That
could be a convenience but it will most likely
not aid with application performance, even if the
repository info could be used directly.

-- Stefan^2.
Received on 2016-02-14 18:18:28 CET

This is an archived mail posted to the Subversion Dev mailing list.