I started this RFE in the Subversion blog: "Packing of the /db/revprops
shards. These are still accumulating hundreds of thousands of TINY files
(avg 150 bytes) in my poor Windows server (NTFS really doesn't like
small files)... with packing, each of these 1000 prop files would be
replaced by a single ~150Kb blob."
Answer from Hyrum Wright: "Revprops are mutable, and as such their size
may change. Modifying a packed revprop would cause the entire shard to
be rewritten, not just the modified value. Aside from the performance
issues, this also causes race conditions when multiple revprops are
being edited at the same time. All of these concerns mean that packing
of revprops probably won't happen any time soon. What might happen is a
migration of revprops to a better storage mechanism, such as sqlite,
though there are no current plans for that."
Even though revprops packing (with an ideal behavior) is not easy to
implement, it's still a highly desirable feature so I propose opening a
bug to track it. I have some suggestions that may even make this viable
for an 1.6.x update:
- Yes revprops are mutable, but in practice they are mostly-readonly
data, remarkably for old revisions. It's not uncommon that revprops
changes be blocked (e.g. SourceForge.net did that until recently). In
many companies this is also mandated. In such cases, revprops ARE
read-only, so revprops could be packed trivially (with the same simple
file layout used by packed revs).
- So, in a first attempt we could have a simple implementation of packed
revprops, with the following constraint: Once a specific revprops shard
is packed, any further attempt to change any of those revprops will be
refused. The admin could use hooks to make the whole thing smoother,
e.g. making sure that complete shards are only packed if older than a
month, or disallowing revprops updates even in non-packed revisions so
the rule is simpler and there are no update failures.
- Additionally, the update of packed revprops could be supported (with
the same simple storage format) even though it COULD be an expensive
operation (lock the entire shard or even the entire repo, rewrite it
completely). It's a reasonable option if, for somebody's repo, revprop
updates are permitted but not very common. For some users (like my
company), that don't make heavy use of revprops, those packed shards
weight in the low hundreds of Kb, so the brute-force update would still
run in a split-second with NO usability disadvantage at all.
- This initial design wouldn't require a new repository format. We could
just have an option like "svnadmin pack --revprops", so by default (with
format=5) only revs are packed, but one could optionally pack the
revprops too. The fsfs layer would have to detect if a revprops shard is
packed, but this is necessary anyway (just like with packed revs)
because the most recent shard is typically not packed. If a future
version of SVN introduces a smarter storage for packed revprops that can
handle frequent updates with low overhead and no coarse-grained locking,
that would be accommodated and distinguished by a different repository
format.
A+
Osvaldo
--
-----------------------------------------------------------------------
Osvaldo Pinali Doederlein Visionnaire Virtus S/A
osvaldo@visionnaire.com.br http://www.visionnaire.com.br
Arquiteto de Tecnologia +55 (41) 3337-1000 #226
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897002
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-04-24 21:44:52 CEST