Marc Strapetz wrote:
>> ... I'll dig into the cache code ...
>
> I did that now and the storage is quite simple: we have a main file
> which contains the diff (added, removed) for every path in every
> revision and a revision-based index file with constant record length (to
> quickly locate entries in the main file).
>
> This storage allows to efficiently query for the mergeinfo diff for a
> path in a certain revision. That's sufficient to build the merge arrows.
> Assembling the complete mergeinfo for a certain revision is hard with
> this cache, but actually not necessary for our use case.
>
> Hence an API like the following should work well for us:
>
> interface MergeinfoDiffCallback {
> void mergeinfoDiff(int revision,
> Map<String, Mergeinfo> pathToAddedMergeinfo,
> Map<String, Mergeinfo> pathToRemovedMergeinfo);
> }
>
> void getMergeinfoDiff(String rootPath,
> long fromRev, long toRev,
> MergeinfoDiffCallback callback)
> throws ClientException;
>
> This should give us all mergeinfo which affects any path at or below
> rootPath.
>
> When disregarding our particular use case, a more consistent API could be:
>
> void getMergeinfoDiff(Iterable<String> paths,
> long fromRev, long toRev,
> Mergeinfo.Inheritance inherit,
> boolean includeDescendants,
> MergeinfoDiffCallback callback)
> throws ClientException;
I want to discourage callers from knowing or caring how the mergeinfo is stored, so I want to leave out the 'inherit' parameter.
I also think it makes sense not to offer the options of ignoring descendants (that is, subtree mergeinfo), or specifying multiple paths. After all, this is not a low level API to be used for implementing the mergeinfo subsystem, it's a high level query.
So let's use the simpler version that's sufficient for your use case.
> The mergeinfo diff should be received starting at fromRev and ending at
> toRev. No callback is expected if there is no mergeinfo diff for a
> certain revision. Depending on the server-side storage, we may require
> to always have fromRev >= toRev or always fromRev <= toRev. If it
> doesn't matter, better have always fromRev <= toRev (for reasons given
> below).
The same procedure could work either forwards or backwards, it doesn't really matter as long as you know which way it is going. Often it is useful to know about the more recent changes first, and have the option to look back right to revision 0 if necessary.
> Regarding the usage, let's assume always fromRev <= toRev, then we will
> invoke
>
> getMergeinfoDiff(cacheRoot, 0, head, callback)
>
> This should start returning mergeinfo diff immediately, starting at
> revision 0, so we quickly make at least a bit of progress. Now, if the
> cache building process is shutdown and restarted later, it will resume
> with the latest known revision:
>
> getMergeinfoDiff(cacheRoot, latestKnownRevision, head, callback)
>
> This procedure will be performed until we have caught up with head.
> Note, that the latestKnownRevision is the last revision for which we
> have received a callback. Depending on the server-side storage, this may
> be different from the current revision which the server is currently
> processing at the time the cache building process is shutdown. Hence it
> will be important that ranges for which no mergeinfo diff is present
> will be processed quickly on the server-side, otherwise we could run
> into some kind of endless loop, if the cache building process is
> shutdown and resumed frequently.
Yes -- if the server takes a long time to work its way through a large range of (say a million) revisions where there are no mergeinfo changes, there is no graceful way to stop the procedure part way through, and no way to discover how far it has searched when you kill it. Maybe that is not important. There is a client-side work-around: request ranges of say a thousand revisions at a time, and then you can easily keep track of how many of these requests have been completed.
OK, that sounds good enough.
- Julian
Received on 2014-02-17 18:36:56 CET