David Glasser wrote:
> On Tue, Dec 2, 2008 at 12:11 PM, Hyrum K. Wright <hyrum_at_hyrumwright.org> wrote:
>> Author: hwright
>> Date: Tue Dec 2 12:11:22 2008
>> New Revision: 34537
>>
>> Log:
>> If we're going to spend the time to open a manifest file once, chances are we're
>> going to need additional information from that manifest file, so just read the
>> entire manifest file in, and cache it.
>>
>> (I know there are more effecient ways to cache it now, we'll leave that for
>> Future Work.)
>>
>> * subversion/libsvn_fs_fs/fs_fs.c
>> (get_packed_offset): Read the entire manifest file into the cache, not just
>> the one rev we're interested in.
>>
>> Modified:
>> trunk/subversion/libsvn_fs_fs/fs_fs.c
>>
>> Modified: trunk/subversion/libsvn_fs_fs/fs_fs.c
>> URL: http://svn.collab.net/viewvc/svn/trunk/subversion/libsvn_fs_fs/fs_fs.c?pathrev=34537&r1=34536&r2=34537
>> ==============================================================================
>> --- trunk/subversion/libsvn_fs_fs/fs_fs.c Tue Dec 2 12:06:40 2008 (r34536)
>> +++ trunk/subversion/libsvn_fs_fs/fs_fs.c Tue Dec 2 12:11:22 2008 (r34537)
>> @@ -1607,12 +1607,11 @@ get_packed_offset(apr_off_t *rev_offset,
>> apr_pool_t *pool)
>> {
>> fs_fs_data_t *ffd = fs->fsap_data;
>> - apr_file_t *manifest_file;
>> - char buf[PACK_MANIFEST_ENTRY_LEN + 1];
>> - apr_off_t rev_index_offset;
>> + svn_stream_t *manifest_stream;
>> svn_boolean_t is_cached;
>> apr_off_t *cached_rev_offset;
>> - apr_size_t len;
>> + svn_revnum_t tmp_rev, start_rev, end_rev;
>> + apr_pool_t *iterpool;
>>
>> SVN_ERR(svn_cache__get((void **) &cached_rev_offset, &is_cached,
>> ffd->packed_offset_cache, &rev, pool));
>> @@ -1624,24 +1623,37 @@ get_packed_offset(apr_off_t *rev_offset,
>> }
>>
>> /* Open the manifest file. */
>> - SVN_ERR(svn_io_file_open(&manifest_file, path_rev_packed(fs, rev, "manifest",
>> - pool),
>> - APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool));
>> + SVN_ERR(svn_stream_open_readonly(&manifest_stream,
>> + path_rev_packed(fs, rev, "manifest", pool),
>> + pool, pool));
>>
>> - /* Seek to the correct offset. */
>> - rev_index_offset = (rev % ffd->max_files_per_dir) *
>> - (PACK_MANIFEST_ENTRY_LEN + 1);
>> - SVN_ERR(svn_io_file_seek(manifest_file, APR_SET, &rev_index_offset, pool));
>> -
>> - /* Read the revision offset and close the file. */
>> - len = PACK_MANIFEST_ENTRY_LEN;
>> - SVN_ERR(svn_io_file_read(manifest_file, buf, &len, pool));
>> - buf[PACK_MANIFEST_ENTRY_LEN] = 0;
>> - SVN_ERR(svn_io_file_close(manifest_file, pool));
>> + /* While we're here, let's just read the entire manifest into our cache. */
>> + start_rev = rev - (rev % ffd->max_files_per_dir);
>> + end_rev = start_rev + ffd->max_files_per_dir - 1;
>> + iterpool = svn_pool_create(pool);
>> + for (tmp_rev = start_rev; tmp_rev <= end_rev; tmp_rev++)
>> + {
>> + svn_stringbuf_t *sb;
>> + apr_off_t offset;
>> + svn_boolean_t eof;
>>
>> - *rev_offset = apr_atoi64(buf);
>> + svn_pool_clear(iterpool);
>> + SVN_ERR(svn_stream_readline(manifest_stream, &sb, "\n", &eof, iterpool));
>
> Error-checking: should make sure that eof is true iff you expect to be
> at eof based on tmp_rev.
Noted (see below about more caching rework).
>> + offset = apr_atoi64(svn_string_create_from_buf(sb, iterpool)->data);
>
> Error-checking: should check that you got something sane here. I
> think you need to check errno (!).
Gah. You're right. I'd prefer to dispense with this check, though, and I
wonder how many other places we don't check errno after parsing a value like this...
>> + SVN_ERR(svn_cache__set(ffd->packed_offset_cache, &tmp_rev, &offset,
>> + iterpool));
>> + }
>> + svn_pool_destroy(iterpool);
>> +
>> + /* Close everything up, and get the value we're interested in from the
>> + cache. */
>> + SVN_ERR(svn_stream_close(manifest_stream));
>>
>> - return svn_cache__set(ffd->packed_offset_cache, &rev, rev_offset, pool);
>> + SVN_ERR(svn_cache__get((void **) &cached_rev_offset, &is_cached,
>> + ffd->packed_offset_cache, &rev, pool));
>
> So teeeeechnically this is not guaranteed to work. While hopefully
> setting sizes properly will avoid this, it's possible that
> max_files_per_dir is so big that by the time you're done loading
> everything from the file, the right answer has been lost from the
> cache. Might as well just notice the right rev when you find it in
> the loop and set *rev_offset then. (Hint: svn_cache__get without
> checking is_cached is a warning sign.)
Yeah, I felt the same way here, but I was hesitant to put a conditional in the
middle of the inner loop like that, especially since it would only be true one
time in 1000.
In any case, I'm working on a patch now which caches the entire contents of the
manifest file as one unit, not each rev->offset pair. It'll chop the space used
by the manifest in half, as well as clean up the cache population logic.
-Hyrum
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=978678
Received on 2008-12-03 06:14:04 CET