Re: export, checkout, commit performance

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2006-03-09 21:30:03 CET

Sebastian Tusk <sebastian.tusk@gmx.net> writes:

> The file to commit is named test. The function performed is guessed.
>
> activity function? time
> ------
> read 512b blocks from "test"
> write 512b blocks to "test.svn_base.tmp" COPY? 3m10s
> ------
> read 102400b blocks from "test.svn-base"
> write 4096b blocks to "temp.tmp"
> occasionally other reads from "test.svn-base" COPY? DIFF? 2m23s
> ------
> read 512b blocks from test.svn-base HASHING? 1m47s
> ------
> read 4096b blocks from "tempfile" TRANSFER TO SERVER 7m8s
> ------
> read 512b blocks from "test.svn-base" HASHING? 1m16s
> ------
> read 512b blocks from "test"
> read 512b blocks from "test.svn-base" COMPARE? 7m46s
> ------
> read 512b blocks from "test.svn-base"
> write 512b blocks to "test.tmp.tmp"
> occasionally other reads from "test.svn-base" COPY? 3m12s
> ------
> read 512b blocks from "test"
> read 512b blocks from "test.tmp" COMPARE? 7m42s
>
> That are all activities that consume significant time. All reads and
> writes are over the full 680MB file. Only during TRANSFER TO SERVER
> there is any noticible server activity. Alltogether the commit command
> uses more than 13 million read or write operations.
>
> Are all this activities necessary?

As I understand it the commit will:

- copy (i.e. read/write) the file to a temporary text-base so that it
won't subsequently change during the rest of the commit.

- read the temporary text-base to get a checksum. I suppose we could
combine this with the copy above but that's quite complex and would
involve moddifying apr_file_copy (or not using it).

- read the temporary text-base to calculate the delta.

Once the commit has succeeded the post commit processing will:

- read the temporary text-base to calculate the checksum. There is a
comment about whether we could reuse the previous checksum. Since
the file is in the .svn area this should be possible.

- read the working and temporary text-base to determine whether the
  working file has changed. This is necessary to determine whether
  the text-time in the entries file should be set to match the working
  file.

I make that 5 reads but you seem to be getting more, I don't know why.

> Especially those after the transfer
> to the server is finished. Is it necessary to access two files in
> parallel in such small blocks? Using larger read/write blocks might
> improve the performance significant without to much hassle.

Using BUFSIZ is probably wrong, particularly if it is 512.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Thu Mar 9 21:30:38 2006

This message: [ Message body ]
Next message: kfogel_at_collab.net: "Re: Huge repository for testing needed."
Previous message: kfogel_at_collab.net: "Re: Why we have NULL txn_id for delta representation?"
In reply to: Sebastian Tusk: "export, checkout, commit performance"
Next in thread: Peter N. Lundblad: "Re: export, checkout, commit performance"
Reply: Peter N. Lundblad: "Re: export, checkout, commit performance"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]