[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Bdb strings anomaly

From: Vadim Chekan <kot.begemot_at_gmail.com>
Date: Tue, 12 Jan 2010 23:33:48 -0800

Hi Michael,

I found why it is happening, but I can't see a simple way to change
the code. It seems to be design, not just code to work this way.

reps-strings.c:svn_fs_base__get_mutable_rep
=============================================
  SVN_ERR(svn_fs_bdb__string_append(fs, &new_str, 0, NULL, trail, pool));
  rep = make_fulltext_rep(new_str, txn_id,
                          svn_checksum_empty_checksum(svn_checksum_md5,
                                                      pool),
                          svn_checksum_empty_checksum(svn_checksum_sha1,
                                                      pool),
                          pool);
  return svn_fs_bdb__write_new_rep(new_rep_key, fs, rep, trail, pool);
=============================================

So in the first line there is explicit write of data of length 0 to
strings table.
It is done in order to get new string ID (new_str variable).
new_str is used in the second call which builds empty representation.
And in the 3rd call the empty representation is saved and new
representation id is returned.

The problem I see is that instead of getting new representation,
svn_fs_base__get_mutable_rep saves new representation and returns just
a key. This leads to duplicate entry and to duplicating of seek/save
operations. One save is called for empty string, and another time when
content is really saved.
Representation is overridden when saved, but strings table allows key
duplicates and we can see it in the dump.

I'm not sure how to solve it. One way is to modify
svn_fs_base__get_mutable_rep to create new keys without saving
strings/representations tables.
Another way is to return new unsaved string/representation implying
that consequent calls will modify representation and save it.

Other optimizations can be implemented, but I can not figure it out,
where db transaction scope is set? I have impression that there is
none which means that each bdb call is a transaction. If so, write
operations are very slow, because transaction commit requires buffers
to be flushed.

Vadim.

On Tue, Dec 22, 2009 at 9:26 AM, Vadim Chekan <kot.begemot_at_gmail.com> wrote:
> Yes, I'll try to figure it out. Just wanted to make sure that it
> wasn't intentional for whatever reason.
> I've already looked up strings bdb implementation and could not see
> anything suspicious. Now it's time for debugger :)
>
> Vadim
>
> On Tue, Dec 22, 2009 at 8:37 AM, C. Michael Pilato <cmpilato_at_collab.net> wrote:
>> IIRC, it's not really 2x nodes -- it's one extra row for each string key.
>> So if a directory listing would normally consume a single string row, yes,
>> theres 1 + 1 = 2 (or, 2x) rows used.  But if a file's contents would consume
>> 10 strings rows, then it's still just the 1 additional empty row.  THAT
>> SAID, it does certainly seem inefficient.
>>
>> Wanna dive into the code and work up a patch?
>>
>>
>> Vadim Chekan wrote:
>>> Hi all,
>>>
>>> Out of curiosity I wrote a script which dumps subversion bdb tables
>>> and found interesting anomaly in "strings" table.
>>> Every string there has a duplicate with empty value.
>>> It is my understanding that "strings" allows duplicates to store very
>>> large content in chunks under the same key. That's fine. But why every
>>> small string (like file name) has a key duplicate? Looks like a bug to
>>> me.
>>> This bug does not prevent normal functioning because strings are
>>> concatenated when read and empty value does not harm, but from
>>> performance point of view, having 2x nodes in btree is not good.
>>>
>>> Here is what I'm talking about:
>>> =========== nodes  ================
>>> k:'0.0.0' v:'((dir 1 / 0  1 0) 0  0 )'
>>> k:'0.0.1' v:'((dir 1 / 5 0.0.0 1 1 1 0 1 0) 0  1 0)'
>>> k:'1.0.1' v:'((file 9 /test.txt 0  1 0 1 0 1 0) 0  1 1)'
>>> k:'next-key' v:'2'
>>> =========== strings  ================
>>> k:'0' v:''
>>> k:'0' v:'((test.txt 5 1.0.1))'
>>> k:'1' v:''
>>> k:'1' v:'aaa'
>>> k:'next-key' v:'2'
>>> =========== revisions  ================
>>> k:'1' v:'(revision 1 0)'
>>> k:'2' v:'(revision 1 1)'
>>>
>>> Pay attention to "strings" key. Empty value is repeated for every string.
>>>
>>> My environment:
>>> svn, version 1.6.5 (r38866)
>>> Linux ubuntu 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 16:20:31 UTC
>>> 2009 i686 GNU/Linux
>>>
>>> Here is the script:
>>> ===========================================================
>>> #!/usr/bin/ruby
>>> require 'bdb'
>>>
>>> $env = BDB::Env.open('repo3/db', flags=BDB::INIT_MPOOL, mode=0)
>>>
>>> def list_content(file, db_type)
>>>     puts "=========== #{file}  ================"
>>>     db = $env.open_db(db_type, name=file)
>>>     db.each do |k,v|
>>>         puts "k:'#{k}' v:'#{v}'"
>>>     end
>>>
>>>     db.close
>>> end
>>>
>>> # checksum-reps
>>> %w(changes copies nodes node-origins miscellaneous representations
>>> strings transactions).
>>>     each{|f| list_content(f, BDB::BTREE) }
>>>
>>> %w(revisions uuids).
>>>    each{|f| list_content(f, BDB::RECNO) }
>>> ===========================================================
>>>
>>>
>>
>>
>> --
>> C. Michael Pilato <cmpilato_at_collab.net>
>> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
>>
>>
>
>
>
> --
> From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT
> is explicitly specified
>

-- 
From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT
is explicitly specified
Received on 2010-01-13 08:34:27 CET

This is an archived mail posted to the Subversion Dev mailing list.