[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Bdb strings anomaly

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Tue, 22 Dec 2009 11:37:32 -0500

IIRC, it's not really 2x nodes -- it's one extra row for each string key.
So if a directory listing would normally consume a single string row, yes,
theres 1 + 1 = 2 (or, 2x) rows used. But if a file's contents would consume
10 strings rows, then it's still just the 1 additional empty row. THAT
SAID, it does certainly seem inefficient.

Wanna dive into the code and work up a patch?

Vadim Chekan wrote:
> Hi all,
>
> Out of curiosity I wrote a script which dumps subversion bdb tables
> and found interesting anomaly in "strings" table.
> Every string there has a duplicate with empty value.
> It is my understanding that "strings" allows duplicates to store very
> large content in chunks under the same key. That's fine. But why every
> small string (like file name) has a key duplicate? Looks like a bug to
> me.
> This bug does not prevent normal functioning because strings are
> concatenated when read and empty value does not harm, but from
> performance point of view, having 2x nodes in btree is not good.
>
> Here is what I'm talking about:
> =========== nodes ================
> k:'0.0.0' v:'((dir 1 / 0 1 0) 0 0 )'
> k:'0.0.1' v:'((dir 1 / 5 0.0.0 1 1 1 0 1 0) 0 1 0)'
> k:'1.0.1' v:'((file 9 /test.txt 0 1 0 1 0 1 0) 0 1 1)'
> k:'next-key' v:'2'
> =========== strings ================
> k:'0' v:''
> k:'0' v:'((test.txt 5 1.0.1))'
> k:'1' v:''
> k:'1' v:'aaa'
> k:'next-key' v:'2'
> =========== revisions ================
> k:'1' v:'(revision 1 0)'
> k:'2' v:'(revision 1 1)'
>
> Pay attention to "strings" key. Empty value is repeated for every string.
>
> My environment:
> svn, version 1.6.5 (r38866)
> Linux ubuntu 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 16:20:31 UTC
> 2009 i686 GNU/Linux
>
> Here is the script:
> ===========================================================
> #!/usr/bin/ruby
> require 'bdb'
>
> $env = BDB::Env.open('repo3/db', flags=BDB::INIT_MPOOL, mode=0)
>
> def list_content(file, db_type)
> puts "=========== #{file} ================"
> db = $env.open_db(db_type, name=file)
> db.each do |k,v|
> puts "k:'#{k}' v:'#{v}'"
> end
>
> db.close
> end
>
> # checksum-reps
> %w(changes copies nodes node-origins miscellaneous representations
> strings transactions).
> each{|f| list_content(f, BDB::BTREE) }
>
> %w(revisions uuids).
> each{|f| list_content(f, BDB::RECNO) }
> ===========================================================
>
>

-- 
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2009-12-22 17:38:13 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.