[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

FSFS access map tool

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 10 Jan 2013 11:52:34 +0100

Hi there,

Julian recently asked me about my new FSFS access map tool.
I take that as an opportunity to introduce you all to it.

As part of my work on I/O reduction in FSFS repos, I needed
some way to actually measure, and ideally visualize, the I/O
activity such that we have nice before-and-after numbers and
images. The basic idea is to simply run strace with any
operation that we want to investigate and let it write the
trace to some file. This will already give us some idea on
access / operational patterns etc. Example:

$ strace -e trace=open,close,read,lseek -o strace.txt \
  ./subversion/svnadmin/svnadmin verify \
  ~/develop/tsvn_repos/ -r1:18000 -M 4000

The fsfs-access-map tool will then take the log file as
and input, count the operations and print a summary. Moreover,
it maps all I/O that goes to rev files onto an array that
represents the those files at cluster granularity. Consecutive
reads will count as a single read, possibly spanning multiple
clusters. Reads after a seek or fopen count as a new access.

$ ./tools/dev/fsfs-access-map strace.txt
                 123 files
             518,618 files opened
           2,861,069 seeks
              14,442 uncached seeks
           2,348,653 reads
               3,929 unique clusters read
           2,453,767 clusters read
       9,601,887,353 bytes read

An "uncached seek" is one that hits a cluster that has not
been read before; "unique clusters read" is the total number
of clusters in rev files that got read at least once. A
cluster in this context is a 64kB block and we assume that
files are cluster-aligned. Please note that these clusters
are rather "physical" clusters than file system allocation
units, i.e. the smallest data size that your disk sub-system
will reasonably deliver in a single access. 64k and 128k
are typical values for RAID systems.

At the end of the analysis, the tool will create two files
in the current folder (didn't bother to make that configurable
-- sorry). access.bmp shows one line per rev file touched,
each pixel representing a cluster. An ideal implementation
would result in white (not read) and dark green pixels
(read once) only. But there are dark purple parts that get
read 1000 times or more. I attached the output for the above
converted into png.

scale.bmp is a 1-pixel-line picture that maps hit counts
0 to 64k to colors on a logaithmic scale (double increments
every 8 pixels).

Furthermore, you can run the tool while the log is still
begin written. That way to can produce "snapshots" of the I/O
activity at different stages of the operation.

-- Stefan^2.

Certified & Supported Apache Subversion Downloads:

Received on 2013-01-10 11:53:09 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.