[PATCH] duplicate keys for the 'strings' table

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-02-27 17:52:44 CET

Per the discussion on streamy FS writes, I went and wrote up the code to use
Berkeley's duplicate keys functionality. Basically, that means that we have
multiple (ordered) data records for a single key. In aggregate, those
records represent the data for the specified key.

The intent here is to write new records with additional blocks of data,
rather than appending to an existing record (thus modifying that record,
thus causing a lot of log activity).

This patch passes 'make check' with no problems.

Now, the good info: when I committed an 'add' of a 25 meg file, it produced
only 25 meg of logs (as expected: log the data before committing to the db).
It also did it in pretty short order, and without sucking up a ton of VM.
There was still some observable growth over time during the commit, but I'm
guessing it is some kind of structure, rather than directly related to the
content (the commit for the file seemed to reach about 9 meg of memory,
slowing growing, rather than jumping).

The time was also pretty good. The import took about a minute. Given the
amount of I/O, that might be about right. Future perf testing can tell us.

However, I don't have any pre-Greg-patch numbers (using Mike's latest 4 meg
buffering). Nor do I have numbers pre-buffer or pre-streaming. All of that
data would be really good to have, to see where we started and where we're
going. I'd also like to see if this dup key stuff has improved performance,
or just reduced our log file spamming.

The patch isn't quite ready for committing: I need to update the doc for
svn_fs__string_read(). It was already out of date, and with this change,
I've also introduced the "may return less than you asked for" semantic of
most of our other reading functions.

The hard-coding of 500k in tree.c should also go (I was lazy and didn't want
to recompile everything by changing the constant in svn_fs.h :-). Note that
cmpilato and I think that constant should move into tree.c anyways.

For now, I'm just posting the patch so others can run some of the
comparative tests. I need sleep :-)

Here is a log message to aid in understanding the patch:

* libsvn_fs/strings-table.c (svn_fs__open_strings_table): set the flags on
    the db to enable duplicate keys.
  (locate_key): new function to allocate a cursor, locate the first record
    of data for a key, and return its length.
  (get_next_length): use the cursor to get the length of the next record of
    data [for the key]
  (svn_fs__string_read): use locate_key and get_next_length to locate the
    data record for the requested offset. return whatever data is available
    in that data record, or the requested length (whichever is less). note
    that this changes the semantics to "return some amount" rather than
    "return all requested"
  (get_key_and_bump): new function containing code factored out of
    svn_fs__stirng_append; it gets the current 'next-key' value and bumps
    the value in the database. it has also been updated to deal with the new
    'put' semantics of databases with dup keys.
  (svn_fs__string_append): just shove another record into the database
  (svn_fs__string_clear): we have to delete prior contents (all the data
    records associated with the key) since we can't just 'put' a zero-length
    value over the top of the old.
  (svn_fs__string_size): revamped to total all the data records for the
    given key.
  (svn_fs__string_copy): revamp. rather than reading and appending to a new
    record, we just copy all the records to the new key.

* libsvn_fs/tree.c (window_consume): HACK. quick change to the buffer limit

* tests/libsvn_fs/strings-reps-test.c (verify_expected_record): print more
    information when an expected size is not met. adjust call to
    svn_fs__string_read() to compensate for not necessarily getting all the
    data.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

text/plain attachment: dupkey-patch.txt

Received on Sat Oct 21 14:37:10 2006

This message: [ Message body ]
Next message: cmpilato_at_collab.net: "Re: [PATCH] duplicate keys for the 'strings' table"
Previous message: cmpilato_at_collab.net: "Re: file permissions"
Next in thread: cmpilato_at_collab.net: "Re: [PATCH] duplicate keys for the 'strings' table"
Maybe reply: cmpilato_at_collab.net: "Re: [PATCH] duplicate keys for the 'strings' table"
Maybe reply: cmpilato_at_collab.net: "Re: [PATCH] duplicate keys for the 'strings' table"
Maybe reply: cmpilato_at_collab.net: "Re: [PATCH] duplicate keys for the 'strings' table"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]