Re: svn commit: r30173 - in branches/in-memory-cache/subversion: include libsvn_subr

From: David Glasser <glasser_at_davidglasser.net>
Date: Fri, 4 Apr 2008 14:19:43 -0700

On Fri, Apr 4, 2008 at 12:32 PM, Blair Zajac <blair_at_orcaware.com> wrote:
>
> David Glasser wrote:
>
> > On Wed, Apr 2, 2008 at 2:20 PM, Blair Zajac <blair_at_orcaware.com> wrote:
> >
> > > glasser_at_tigris.org wrote:
> > >
> > >
> > > > Author: glasser
> > > > Date: Tue Apr 1 16:24:34 2008
> > > > New Revision: 30173
> > > >
> > > > Log:
> > > > On the in-memory-cache branch:
> > > >
> > > > Add memcached-base caches. (Not currently used anywhere.) Hardcode
> > > > in use of a single server (localhost:11211).
> > > >
> > > > This is using the apr_memcache code that currently lives on apr-util
> > > > trunk and will be eventually released in apr-util 1.3. It was
> > > > imported from a separate apr_memcache package; we should be able to
> > > > use either version, but I haven't done the configuration for that yet
> > > > (Dan Christian sent me a patch to do that, though).
> > > >
> > > >
> > > For memcached I suggest exposing the memcached flags value.
> > >
> > > Uses for the key:
> > >
> > > 1) It's a very useful tool for not having to encode additional
> information
> > > in the key, which makes key manipulation faster.
> > >
> > > 2) Hash some unique value into N bits of the flags. Each
> > > serializer/deserializer pair will pick its own unique value. When you
> > > version up a structure that you are caching, say add an additional
> field,
> > > you bump the value and even if the memcached server is not bounced, you
> can
> > > ignore its results since the returned flags doesn't match.
> > >
> > > 3) Use some bits of the flags to indicate compression. You can not
> > > compress some short keys and compress longer ones.
> > >
> > > So I suggest modeling this API to be similar to the *gasp* new Java API
> > > which is a great piece of work:
> > >
> > > http://bleu.west.spy.net/~dustin/projects/memcached/apidocs/
> > >
> > > Have a new type, say cached_data that contains a
> > >
> > > struct svn_cached_data
> > > {
> > > apr_size_t flags;
> > > const char *data;
> > > apr_size_t data_len;
> > > }
> > >
> > > So something like
> > >
> > >
> > > typedef svn_error_t *(svn_cache_deserialize_func_t)(void **out,
> > > svn_cached_data
> *data,
> > > apr_pool_t *pool);
> > >
> > > typedef svn_error_t *(svn_cache_serialize_func_t)(svn_cached_data
> **data,
> > > void *in,
> > > apr_pool_t *pool);
> > >
> > > I think you'll find not having the flags available will be a drawback
> > > sometime in the future, so I strongly suggest putting it now.
> > >
> > > I realize that an in-process cache won't need a flags value, but better
> to
> > > have the API expose the flags for memcached and ignore it for in-process
> > > cache then not have it.
> > >
> > > BTW, I'm doing the same thing in my Subversion server, but at a higher
> > > level. We're caching in memcached a (repos-uuid, rev, path) with a
> (node-id)
> > > value and then doing lookups (repos-uuid, node-id) for the real data.
> > >
> >
> > All current uses of svn_cache_t are already serializing/deserializing
> > complex data structures into strings, so that already gives you room
> > to add flags. I'd rather keep the flags for the use of the
> > svn_cache-memcached implementation itself (eg, to specify that the
> > value has been compressed).
> >
>
> Why is the code always serializing/deserializing? For an in-memory cache
> this 1) consumes CPU that you don't need; 2) commonly the serialization will
> consume more memory then the unserialized data; 3) you'll end up with a
> serialized and unserialized copies of the same data in memory at the same
> time.

Because these are nested data structures like dag_node_t which
contains node_revision_t, each of which contain a bunch of strings and
svn_fs_id_t's.

The goal isn't to save CPU or even memory; it's to save IO/network
use. Maybe it won't work. We'll see.

> Not that C lets you do this because you end up casting everything to void *
> and you can make mistakes with casting void * back to the expected data
> type, but this is nicer then having to go through a deserializer.
>
> It doesn't make sense to treat in-process and remote memory the same, I
> think the API should reflect the differences between the two and the
> different design decisions they impose on the program.
>
> By treating them the same you have to serializing for the in-process cache
> and loose exposing flags for the remote cache.

I don't serialize for the in-process cache: I dup.

In the APR pool memory model I can't think of any useful way to have a
cache (with evictions!) without dup'ing in and out. (OK, you can dup
only in and be very very careful to manually dup on the way out when
you need the value for more than a few statements... which is exactly
what the FSFS directory contents cache used to do, which was a source
of real repository corruption.)

> I guess you could have an identity serializer/desializer for the in-memory
> cache that just casts the pointer to 4 or 8 bytes and then casts it back to
> a pointer later. Just seems messy this way.

I don't understand how this would work with pool-allocated data.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org

Received on 2008-04-04 23:20:02 CEST

This message: [ Message body ]
Next message: Jack Repenning: "from main.c"
Previous message: Karl Fogel: "Re: [PATCH] bring back some examples in notes/sparse-directories.txt"
In reply to: Blair Zajac: "Re: svn commit: r30173 - in branches/in-memory-cache/subversion: include libsvn_subr"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]