[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: pack revprops shards

From: Hyrum K. Wright <hyrum_at_hyrumwright.org>
Date: Mon, 27 Apr 2009 09:29:06 -0500

On Apr 27, 2009, at 3:39 AM, Bolstridge, Andrew wrote:

>> -----Original Message-----
>> From: Osvaldo Pinali Doederlein [mailto:osvaldo_at_visionnaire.com.br]
>> Sent: Friday, April 24, 2009 8:35 PM
>> To: users_at_subversion.tigris.org
>> Subject: RFE: pack revprops shards
>>
>> I started this RFE in the Subversion blog: "Packing of the
> /db/revprops
>> shards. These are still accumulating hundreds of thousands of TINY
>> files
>> (avg 150 bytes) in my poor Windows server (NTFS really doesn't like
>> small files)... with packing, each of these 1000 prop files would be
>> replaced by a single ~150Kb blob."
>>
>> Answer from Hyrum Wright: "Revprops are mutable, and as such their
> size
>> may change. Modifying a packed revprop would cause the entire shard
>> to
>> be rewritten, not just the modified value. Aside from the performance
>> issues, this also causes race conditions when multiple revprops are
>> being edited at the same time. All of these concerns mean that
>> packing
>> of revprops probably won't happen any time soon. What might happen is
> a
>> migration of revprops to a better storage mechanism, such as sqlite,
>> though there are no current plans for that."
>>
>
> The answer to that is to load the shard as a memory-mapped file. Then
> you update the only appropriate revprop. If you're concerned about
> inserting data into the middle of a shard, when packing revprops
> leave a
> chunk of space after each one. Then you can write into the gap. If the
> amount of data would overflow the gap, then you'd have to fall back
> to a
> full rewrite of the entire shard.

Not a bad solution. We'd have to determine a "best guess" value for
the extra space, so we don't end up negating the space savings with
the padding. Also, we'd have to store both an offset and length for
each revprop, instead of just the offset (which for revision data
implies the length). We'd also need to keep a fallback mechanism for
revprops which overflow their assigned buffers.

And this approach *still* has us manually managing on-disk formats and
such. I'd like to delegate that to competent libraries, such as sqlite.

> The only question now is - would packing revprops increase performance
> much? I guess they do get read a lot for operations like log, list
> etc.
> I'd say they get read a lot more than past revisions do, so the
> performance increase might (*might*) be noticeable.

You'd be surprised how much the revision data (not props) are used,
particularly when reconstructing full-texts using deltas.

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1946728

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-04-27 16:30:27 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.