[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: is_packed(noderev); revision root ID caching

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Mon, 22 Nov 2010 02:23:49 +0100

On 31.10.2010 16:09, Daniel Shahaf wrote:
> I confirm that this fixes the test failures I reported yesterday.
>
> Review below.
>
> stefan2_at_apache.org wrote on Sun, Oct 31, 2010 at 13:40:12 -0000:
>> Author: stefan2
>> Date: Sun Oct 31 13:40:12 2010
>> New Revision: 1029340
>>
>> URL: http://svn.apache.org/viewvc?rev=1029340&view=rev
>> Log:
>> Fix handling of cached revision root IDs. Issue discussed here:
>> http://svn.haxx.se/dev/archive-2010-10/0497.shtml
>>
>> The /trunk code caches them for a single session (i.e. server
>> request) only and assumes that the respective revision files
>> won't get packed while the request is being processed.
>>
> In other words, if there is a long-running request, and packing is run while the
> request is in progress, there is a window where that request will
> mis-compute the offsets?
>
Basically, during long-running requests, no files in the respective
repository must get packed. For instance, svn_fs_fs__path_rev_absolute
may return the non-packed path to open_pack_or_rev_file but shortly
thereafter, that might no longer be valid and open_pack_or_rev_file
would fail (on trunk).

According to my understanding, there will be a file lock that prevents
a pack as long as the request is running.
>> But now, the cache lives as long as the server process and
>> therefore, we must detect whether the respective revision
>> got packed (or, theoretically, unpacked) after we cached its
>> root ID. If that state changed, the file offset needs to be
>> updated. Simply re-read the whole ID in that case.
>>
> I've fixed more than one of these problems. Usually the solution was to
> keep using the revfile-relative offsets, and only at the point where
> they are to be used, to translate the (rev,offset) pair into
> a (packfile, new offset) pair. (This is done at the very point the rev
> file is being opened, to account for the possibility that it becomes
> packed /just as/ we attempt to open it.) It seems that you went with
> "if I cached (rev,offset) and now it's packed, give up". Why this
> difference?
My general strategy is to keep the cache infrastructure that is already
present on trunk and keep the data as long as possible, i.e. over
multiple requests and possibly the whole lifetime of the server. So,
no additional calculations etc. should be necessary.

Before r1037548, the revision root IDs might get outdated by packing
that revision. So, I simply tagged the ID structure with a "has it been
already packed when I put the ID into the cache?" flag. Now, I'm
simply using two different caches for IDs of packed and non-packed
revisions.

> Tangentially, another place where the supposedly-missing "offset +=
> get_rev_offset_in_pack_file()" might have a noticeable effect is if the
> repository is already packed when the server is started; but I presume
> you've tested your code on packed repositories.
Yes, I usually run it on packed repositories and sometimes on non-
packed ones as well.
>> * subversion/libsvn_fs_fs/id.h
>> (svn_fs_fs__is_packed, svn_fs_fs__set_packed): new functions
>> to store the "rev is packed" state
>>
>> * subversion/libsvn_fs_fs/id.c
>> (id_private_t): add new element
>> (svn_fs_fs__is_packed, svn_fs_fs__set_packed): implement
>> (svn_fs_fs__id_txn_create, svn_fs_fs__id_rev_create, svn_fs_fs__id_copy,
>> svn_fs_fs__id_parse): handle the new element as well
>>
>> * subversion/libsvn_fs_fs/fs_fs.c
>> (svn_fs_fs__rev_get_root): require "is packed" flag to match
>> before returning a cached root ID; store that flag in cached ID
>>
>> Modified:
>> subversion/branches/performance/subversion/libsvn_fs_fs/fs_fs.c
>> subversion/branches/performance/subversion/libsvn_fs_fs/id.c
>> subversion/branches/performance/subversion/libsvn_fs_fs/id.h
>>
>> Modified: subversion/branches/performance/subversion/libsvn_fs_fs/fs_fs.c
>> URL: http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_fs_fs/fs_fs.c?rev=1029340&r1=1029339&r2=1029340&view=diff
>> ==============================================================================
>> --- subversion/branches/performance/subversion/libsvn_fs_fs/fs_fs.c (original)
>> +++ subversion/branches/performance/subversion/libsvn_fs_fs/fs_fs.c Sun Oct 31 13:40:12 2010
>> @@ -2953,7 +2953,8 @@ svn_fs_fs__rev_get_root(svn_fs_id_t **ro
>>
>> SVN_ERR(svn_cache__get((void **) root_id_p,&is_cached,
>> ffd->rev_root_id_cache,&rev, pool));
>> - if (is_cached)
>> + if (is_cached&&
>> + is_packed_rev(fs, rev) == svn_fs_fs__is_packed(*root_id_p))
> Bug: you are comparing a boolean to a tristate.
>
> Even if both of them were tristates, I still think you would have had to
> check that they are both true or both false (but not both unknown).
>
>> return SVN_NO_ERROR;
>>
>> /* we don't care about the file pointer position */
>> @@ -2967,6 +2968,9 @@ svn_fs_fs__rev_get_root(svn_fs_id_t **ro
>>
>> SVN_ERR(get_fs_id_at_offset(&root_id, apr_rev_file, fs, rev, root_offset,
>> pool));
>> + svn_fs_fs__set_packed(root_id, is_packed_rev(fs, rev)
>> + ? svn_tristate_true
>> + : svn_tristate_false);
>>
>> SVN_ERR(svn_file_handle_cache__close(revision_file));
>>
>>
>> Modified: subversion/branches/performance/subversion/libsvn_fs_fs/id.c
>> URL: http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_fs_fs/id.c?rev=1029340&r1=1029339&r2=1029340&view=diff
>> ==============================================================================
>> --- subversion/branches/performance/subversion/libsvn_fs_fs/id.c (original)
>> +++ subversion/branches/performance/subversion/libsvn_fs_fs/id.c Sun Oct 31 13:40:12 2010
>> @@ -34,6 +34,7 @@ typedef struct {
>> const char *txn_id;
>> svn_revnum_t rev;
>> apr_off_t offset;
>> + svn_tristate_t is_packed;
>> } id_private_t;
>>
> (I'll refer to this point later)
>
>>
>> @@ -84,6 +85,24 @@ svn_fs_fs__id_offset(const svn_fs_id_t *
>> }
>>
>>
>> +svn_tristate_t
>> +svn_fs_fs__is_packed(const svn_fs_id_t *id)
>> +{
>> + id_private_t *pvt = id->fsap_data;
>> +
>> + return pvt->is_packed;
>> +}
>> +
> I think these should be in the svn_fs_fs__id_ namespace:
>
> svn_fs_fs__id_is_packed
> svn_fs_fs__id_set_packed
>
> for consistency with the rest of the file (and svn), and because
> "__is_packed" seems to me to name function that should talk about
> revisions, not about id's.
>
>> +void
>> +svn_fs_fs__set_packed(const svn_fs_id_t *id,
>> + svn_tristate_t is_packed)
>> +{
>> + id_private_t *pvt = id->fsap_data;
>> +
>> + pvt->is_packed = is_packed;
>> +}
>> +
> None of the other id_private_t members has a setter function, why
> should this one be an exception?
>
>> +
>> svn_string_t *
>> svn_fs_fs__id_unparse(const svn_fs_id_t *id,
>> apr_pool_t *pool)
>>
>> Modified: subversion/branches/performance/subversion/libsvn_fs_fs/id.h
>> URL: http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_fs_fs/id.h?rev=1029340&r1=1029339&r2=1029340&view=diff
>> ==============================================================================
>> --- subversion/branches/performance/subversion/libsvn_fs_fs/id.h (original)
>> +++ subversion/branches/performance/subversion/libsvn_fs_fs/id.h Sun Oct 31 13:40:12 2010
>> @@ -49,6 +49,12 @@ svn_revnum_t svn_fs_fs__id_rev(const svn
>> ID. */
>> apr_off_t svn_fs_fs__id_offset(const svn_fs_id_t *id);
>>
>> +/* Access the "is_packed" flag of the ID. */
>> +svn_tristate_t svn_fs_fs__is_packed(const svn_fs_id_t *id);
>> +
>> +/* Set the "is_packed" flag of the ID. */
>> +void svn_fs_fs__set_packed(const svn_fs_id_t *id, svn_tristate_t is_packed);
>> +
> Two issues here:
>
> * I think these need more docs. The other functions (which read, e.g.,
> "access the 'node id' portion of the ID") include by reference the
> docs in the 'structure' file. For the 'packed?' member, however,
> there is no such concept as a "packed node-revision" -- only a "packed
> revision file" --- so you have to document it explicitly.
>
> * Design issue: is "is this ID packed?" a valid question to ask?
>
> What exactly do you mean by this? Are you referring to whether the
> (revision,offset) at which the noderev header appears is a packed
> revision? Or to whether the data and props representations live in
> packed revisions?
>
> The second question is not a good one because the data can live in
> a packed rev while the props in a non-packed rev. The first question,
> I'm not sure that's an inherent property of the noderev->id itself: if
> the cache needs to cache the value of is_packed_rev(id->revision) for
> validation upon get(), it can do that without storing its bookkeeping
> in the id_private_t itself.
>
> (But see my first response; I think it's just fine to return the
> offset-within-revision regardless of whether that revision is or was
> packed, and leave it to whoever uses the offset to open the pack file
> instead of the revision file, should that be needed.)
>
All of that extra ID tagging code has been made obsolete
and got removed in r1037548.
>> /* Convert ID into string form, allocated in POOL. */
>> svn_string_t *svn_fs_fs__id_unparse(const svn_fs_id_t *id,
>> apr_pool_t *pool);
>>
>>
> Thanks for the quick fix,
>
> Daniel
Thanks for the feedback. It often takes two attempts to
approach an issue from the right angle.

-- Stefan^2.
Received on 2010-11-22 02:24:16 CET

This is an archived mail posted to the Subversion Dev mailing list.