RE: SVN/Apache - Log full transaction I/O across clients.

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Tue, 6 Oct 2015 15:09:21 +0200

Op 2-okt.-2015 17:02 schreef "Terry Dooher" <Terry.Dooher_at_naturalmotion.com
>:
>
>
> > > Is there a way I can ensure the log is only written after the full
> > request has been serviced? The data over time will be really useful in
> > gauging usage over time.
> > >
>
>
> > I think this is because 1.8+serf uses "skelta style updates" instead
> > of "bulk updates". With skelta mode the client uses a separate HTTP
> > request for each resource that needs to be fetched (instead of pulling
> > the entire update through a massive http request/response). See here:
> >
> > http://subversion.apache.org/docs/release-notes/1.8.html#neon-deleted
> >
> > As explained there, you can configure the server to "prefer" bulk
> > update mode, if you wish. That might give you back the ability to
> > measure the checkouts as one huge HTTP response.
> >
>
> Thanks, Johan; that makes a lot of sense. We make heavy use of caching
and have large repos (~20 of them totalling 2TB). If I read those notes
correctly, disabling skelta in favour of bulk updates would make the
caching much less efficient, right?
>
No, that's not the case. Skelta or Bulk only determines how the data is
transferred to the client (skelta = one http get per file; bulk = one giant
http report with all file data). It is completely orthogonal to the
caching, which determines how much memory the server devotes to caching the
data and metadata it has read from the filesystem. The "update" mode
(skelta or bulk) should not really impact the efficiency of the caching,
and vice versa.

I don't know mod_logio. It's not clear to me whether it reports the amount
of bytes transferred (in that case it's normal that for skelta the value is
small, since the initial report only send little data -- not the file
contents), or the amount of bytes read from disk during the operation (in
that case it's also presumably spread accross the many get requests in the
case of skelta mode).

OTOH: to be precise, there is a special optimization when using skelta mode
in 1.8 or higher (i.e. the default mode when talking to a 1.8 or higher
server): the client will not fetch file contents it already has in its
pristine store (based on the sha1 sums of the files, which are transferred
in the initial response). This is very powerful e.g. when switching between
related branches (lots of files are the same, so the client already has
those pristines), or when using a sparse checkout of the repository root
and checking out various related branches as children of your 'depth=empty'
working copy root (then all your branches share the same working copy root
-> same pristine store).

But I assumed this optimization was not playing a role in your tests, since
you spoke about performing new checkouts (so no shared working copy).

> Perversely, the very thing I was hoping to gauge with the I/O logging was
the real-world efficiency gains of these caching options:
> > > SVNCacheTextDeltas On
> > > SVNCacheFullTexts On
> > > SVNCacheRevProps On
> > > SVNInMemoryCacheSize 262144
>
> On balance, I'll probably just stick with skelta and trust that it
improves things. I could manually meter some operations, but it's hard to
accurately simulate cache use outside a live scenario...
Yeah, the defaults should be good. But as I said, cache effectiveness is
mostly orthogonal to skelta vs. bulk update mode.

-- 
Johan

Received on 2015-10-06 15:09:50 CEST

This message: [ Message body ]
Next message: Devi Tummala: "unsubscribe"
Previous message: Axel Kittenberger: "Re: svn segfaults on relocate when having a not checked out external"
In reply to: Terry Dooher: "RE: SVN/Apache - Log full transaction I/O across clients."

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]