On 20.02.2011 21:02, Johan Corveleyn wrote:
> On Sun, Feb 20, 2011 at 6:35 PM, Mark Mielke<mark_at_mark.mielke.cc> wrote:
>
>> That said, I'm also (in principle) against implementing cache of open file
>> handles. I prefer architectures that cache intermediate data in a processed
>> form that the application has made a determined choice to make use of such
>> that the cache is the most useful to the application, rather than a
>> transparent caching layer that guesses at what is safe. The OS file system
>> layer is exactly this - any caching it does is transparent to the
>> application and a guess. Guesses are dangerous, which is exactly why the OS
>> file system layer cannot do as much caching unless it has 100% control of
>> the file system (= local file system).
Agreed. For that very reason, I added extensive
caching to the FSFS code and got even more of that
in the pipeline for 1.8.
That being said, there are still typical situations in
which the data cache may not be effective:
* access to relatively rarely read data
(log, older tags;
you still want to perform decently in that case)
* first access to the latest revision
(due to the way transactions are implemented,
it is difficult to fill all the caches upon write)
* amount of active data > available RAM
(throws you back to the first issue more often)
> I agree that it would be best if the architecture was so that svn
> could organize its work for most use cases in a way that's efficient
> for the lower levels of the system. For instance, for "svn log", svn
> should in theory be able to do its work with exactly 1 open/close per
> rev file (or in a packed repository, maybe even only 1 open/close per
> packed file).
Yes, it may be very hard to anticipate what data may
be needed further down the road, even if we had a
marvelous "1 query gets it all" interface where feasible:
svn log, for instance, is often run with a limit on the number
of results. However, there is no way to tell how much of
a packed file needs to be read to process that query.
There is only a lower bound.
So, it can be very beneficial to keep a small number of
file handles around to "bridge" various stages / iterations
within a single request.
> But right now, this isn't the case, and I think it would be a huge
> amount of work, change in architecture, layering, ... Until that
> happens, I think such a generic file-handle caching layer could prove
> very helpful :-). Note though that, if I understood correctly, the
> file-handle caching of the performance branch will not be reintegrated
> into 1.7, but maybe 1.8 ...
>
> But maybe stefan2 can comment more on that :-).
Because keeping file open for a potentially much
longer period of time may have an impact on other,
rarely run operations like pack, I don't think we should
risk merging this into 1.7.
-- Stefan^2.
Received on 2011-02-22 08:40:25 CET