Saw the fixes -- looks good, thanks.
Daniel Shahaf wrote on Sun, Feb 12, 2012 at 04:42:08 +0200:
> Stefan Fuhrmann wrote on Sun, Feb 12, 2012 at 03:06:31 +0100:
> > On 09.02.2012 16:05, Daniel Shahaf wrote:
> > >stefan2_at_apache.org wrote on Wed, Feb 08, 2012 at 00:44:26 -0000:
> > >>Author: stefan2
> > >>Date: Wed Feb 8 00:44:26 2012
> > >>New Revision: 1241718
> > >>
> > >>URL: http://svn.apache.org/viewvc?rev=1241718&view=rev
> > >>Log:
> > >>Major improvement in delta window handling: Cache intermediate
> > >>combined delta windows such that changes "close by" won't need
> > >>to discover and read the whole chain again.
> > >>
> > >>For algorithms that traverse history linearly, this optimization
> > >>gives delta combination an amortized constant runtime.
> > >>
> > >>For now, we only cache representations< 100kB. Support for larger
> > >>reps can be added later.
> > >>
> > >>* subversion/libsvn_fs_fs/fs.h
> > >> (fs_fs_data_t): add cache for combined windows
> > >>* subversion/libsvn_fs_fs/caching.c
> > >> (svn_fs_fs__initialize_caches): initialize that cache
> > >>
> > >>* subversion/libsvn_fs_fs/fs_fs.c
> > >> (rep_state): add reference to new cache
> > >> (create_rep_state_body): init that reference
> > >> (rep_read_baton): add reference to cached base window
> > >> (get_cached_combined_window, set_cached_combined_window):
> > >> new utility functions
> > >> (build_rep_list): terminate delta chain early if cached
> > >> base window is available
> > >> (rep_read_get_baton): adapt caller
> > >> (get_combined_window): re-implement
> > >> (get_contents): handle new special case; adapt to
> > >> get_combined_window() signature changes
> > >>
> > >>Modified:
> > >> subversion/trunk/subversion/libsvn_fs_fs/caching.c
> > >> subversion/trunk/subversion/libsvn_fs_fs/fs.h
> > >> subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> > >>
> > >I haven't reviewed this, but a question:
> > >
> > >>+/* Read the WINDOW_P for the rep state RS from the current FSFS session's
> > >>+ * cache. This will be a no-op and IS_CACHED will be set to FALSE if no
> > >>+ * cache has been given. If a cache is available IS_CACHED will inform
> > >>+ * the caller about the success of the lookup. Allocations (of the window
> > >>+ * in particualar) will be made from POOL.
> > >>+ */
> > >>+static svn_error_t *
> > >>+get_cached_combined_window(svn_stringbuf_t **window_p,
> > >>+ struct rep_state *rs,
> > >>+ svn_boolean_t *is_cached,
> > >>+ apr_pool_t *pool)
> > >>+{
> > >>+ if (! rs->combined_cache)
> > >>+ {
> > >>+ /* txdelta window has not been enabled */
> > >>+ *is_cached = FALSE;
> > >>+ }
> > >>+ else
> > >>+ {
> > >>+ /* ask the cache for the desired txdelta window */
> > >>+ return svn_cache__get((void **)window_p,
> > >>+ is_cached,
> > >>+ rs->combined_cache,
> > >>+ get_window_key(rs, rs->start, pool),
> > >How does the cache key identify the particular combined window being
> > >cached?
> >
> > Undeltified windows use the same key as their deltified
> > representation; basically revision file + offset. The distinction
> > between deltified and un-deltified is made by the cache
> > instance prefix.
>
> What revision file and what offset, and how do they related to the
> window object contained in the cache?
>
> (I'm going to guess that the key is the rev-file/offset of a rep that
> generates the same fulltext as the cached window; but I shouldn't have
> to guess.)
>
> > >get_window_key() may return "". If it returns "" when called as an
> > >argument to svn_cache__set() and then also here, won't the cache return
> > >a wrong result?
> > >
> > There is a comment in get_window_key() for this case.
> > "" will only be returned if we can't get the name of the
> > open APR file. This is virtually impossible. If it happens
> > anyways, it will hit the deltified window cache first, we will
> > report a repository corruption.
> >
>
> In plain English: there is an unlikely, but not impossible, scenario
> where the only thing between your new code and silent corruption
> (specifically: incorrect retrieval of a fulltext) is the order of
> lookups in two different caches.
>
> That sounds awfully brittle to me, and the sensitivity of the lookup
> order is not documented anywhere.
>
> > But maybe we should change the cache API definition
> > to support and reject NULL keys. get_window_key() could
> > then return simply NULL and could do with fewer assumptions.
> >
>
> What does "support and reject" mean? That trying to get(key=NULL)
> always returns "not found" and trying to set(key=NULL) doesn't change
> the cache's state?
>
> > -- Stefan^2.
> >
Received on 2012-02-12 19:07:55 CET