Daniel Shahaf <danielsh_at_elego.de> writes:
> The process to edit a revprop that had been packed would be:
> * grab write lock
> * prepare a new pack-file-with-inline-manifest
> * move-into-place, atomically replacing the previous pack file
> * ungrab write lock
> That is what guarantees cp(1) consistency that Hyrum mentions.
Atomic replace is going to involve a retry loop on Windows, and so could
potentially take a long time. Virus scanners on repository disks?
Holding the write lock will block readers so for better concurrency we
want to minimise the time that a write lock is held, or have more locks.
Is there one lock per pack file or one lock per repository? Having
multiple locks means that a write only blocks a proportion of the reads,
but also means an operation like log has to acquire multiple locks (in
series not in parallel).
We can move the prepare/write outside the write lock by using a sequence
number in the pack file. Then the algorithm could be:
* prepare a new pack file with incremented sequence number
* grab write lock
* check old sequence number
* if changed: drop lock and start again
* otherwise: atomic replace pack file
* drop write lock
That doesn't help with the retry problem on Windows, but may reduce the
time the write lock is held, particularly if the pack file is large. It
shoud work well if writes are much less common than reads. If writes
are common then the simple write lock serialisation may be better.
The two algorithms should interoperate so, even if we use a simple write
lock today, implementing the sequence number would give us options in
> This also implies that propediting a revprop-that-had-been-packed will
> have to rewrite a packfile containing a thousand revision's properties.
> We expect that to be a reasonable cost given that revprop files are
> small and historical revprops are rarely edited.
r0 revprops are a concern, they can have different access patterns. For
example a master/slave setup running svnsync once per revision (a common
setup) will write the r0 pack file several times per revision. We don't
want the pack file to become the dominant IO.
I wonder if we could offset the shard boundaries, so that r0 is the last
revision in the first shard and r1-rN is the second shard. Then r0
would be a shard on its own and the r0 pack file would be much smaller.
We would have to repack the repository on upgrade but the code changes
for this could be small, just +/-1 in a few places.
uberSVN: Apache Subversion Made Easy
Received on 2011-07-07 09:37:00 CEST