Re: RFE: API for an efficient retrieval of server-side mergeinfo data

From: Marc Strapetz <marc.strapetz_at_syntevo.com>
Date: Tue, 18 Feb 2014 11:57:58 +0100

On 17.02.2014 18:36, Julian Foad wrote:
> Marc Strapetz wrote:
>
>>> ... I'll dig into the cache code ...
>>
>> I did that now and the storage is quite simple: we have a main file
>> which contains the diff (added, removed) for every path in every
>> revision and a revision-based index file with constant record length (to
>> quickly locate entries in the main file).
>>
>> This storage allows to efficiently query for the mergeinfo diff for a
>> path in a certain revision. That's sufficient to build the merge arrows.
>> Assembling the complete mergeinfo for a certain revision is hard with
>> this cache, but actually not necessary for our use case.
>>
>> Hence an API like the following should work well for us:
>>
>> interface MergeinfoDiffCallback {
>> void mergeinfoDiff(int revision,
>> Map<String, Mergeinfo> pathToAddedMergeinfo,
>> Map<String, Mergeinfo> pathToRemovedMergeinfo);
>> }
>>
>> void getMergeinfoDiff(String rootPath,
>> long fromRev, long toRev,
>> MergeinfoDiffCallback callback)
>> throws ClientException;
>>
>> This should give us all mergeinfo which affects any path at or below
>> rootPath.
>>
>> When disregarding our particular use case, a more consistent API could be:
>>
>> void getMergeinfoDiff(Iterable<String> paths,
>> long fromRev, long toRev,
>> Mergeinfo.Inheritance inherit,
>> boolean includeDescendants,
>> MergeinfoDiffCallback callback)
>> throws ClientException;
>
> I want to discourage callers from knowing or caring how the mergeinfo is stored, so I want to leave out the 'inherit' parameter.
>
> I also think it makes sense not to offer the options of ignoring descendants (that is, subtree mergeinfo), or specifying multiple paths. After all, this is not a low level API to be used for implementing the mergeinfo subsystem, it's a high level query.
>
> So let's use the simpler version that's sufficient for your use case.

That will be fine.

>> The mergeinfo diff should be received starting at fromRev and ending at
>> toRev. No callback is expected if there is no mergeinfo diff for a
>> certain revision. Depending on the server-side storage, we may require
>> to always have fromRev >= toRev or always fromRev <= toRev. If it
>> doesn't matter, better have always fromRev <= toRev (for reasons given
>> below).
>
> The same procedure could work either forwards or backwards, it doesn't really matter as long as you know which way it is going. Often it is useful to know about the more recent changes first, and have the option to look back right to revision 0 if necessary.

From cache perspective it's easier to build the cache starting at r0:
then cache files will contain information for older revision at lower
positions. This allows to crop files easily at a certain revision and
rebuild them from there. That's something we do, if a Log message is
modified from within the GUI (it might not play a role for mergeinfo,
though). Anyway, I agree that receiving mergeinfo for more recent
revisions first is reasonable as well. Hence if you say the effort is
the same, then we could allow both: fromRev <= toRev, in which case we
will received mergeinfo in ascending order and fromRev > toRev in which
case it will be descending order?

>> Regarding the usage, let's assume always fromRev <= toRev, then we will
>> invoke
>>
>> getMergeinfoDiff(cacheRoot, 0, head, callback)
>>
>> This should start returning mergeinfo diff immediately, starting at
>> revision 0, so we quickly make at least a bit of progress. Now, if the
>> cache building process is shutdown and restarted later, it will resume
>> with the latest known revision:
>>
>> getMergeinfoDiff(cacheRoot, latestKnownRevision, head, callback)
>>
>> This procedure will be performed until we have caught up with head.
>> Note, that the latestKnownRevision is the last revision for which we
>> have received a callback. Depending on the server-side storage, this may
>> be different from the current revision which the server is currently
>> processing at the time the cache building process is shutdown. Hence it
>> will be important that ranges for which no mergeinfo diff is present
>> will be processed quickly on the server-side, otherwise we could run
>> into some kind of endless loop, if the cache building process is
>> shutdown and resumed frequently.
>
> Yes -- if the server takes a long time to work its way through a large range of (say a million) revisions where there are no mergeinfo changes, there is no graceful way to stop the procedure part way through, and no way to discover how far it has searched when you kill it. Maybe that is not important. There is a client-side work-around: request ranges of say a thousand revisions at a time, and then you can easily keep track of how many of these requests have been completed.

OK, that will work.

-Marc
Received on 2014-02-18 11:58:36 CET

This message: [ Message body ]
Next message: Bert Huijben: "RE: E175013 svn diff failure (access forbidden) with 1.8.5 (regression)"
Previous message: Vincent Lefevre: "Re: E175013 svn diff failure (access forbidden) with 1.8.5 (regression)"
In reply to: Julian Foad: "Re: RFE: API for an efficient retrieval of server-side mergeinfo data"
Next in thread: Julian Foad: "Re: RFE: API for an efficient retrieval of server-side mergeinfo data"
Reply: Julian Foad: "Re: RFE: API for an efficient retrieval of server-side mergeinfo data"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]