[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: rev 1380 - trunk/subversion/include trunk/subversion/libsvn_fs

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-02-26 23:46:23 CET

On Tue, Feb 26, 2002 at 03:47:30PM -0600, cmpilato@tigris.org wrote:
>...
> +++ NEW/trunk/subversion/include/svn_fs.h Tue Feb 26 15:47:30 2002
> @@ -34,6 +34,20 @@
> #include "svn_io.h"
>
>
> +/* Data written to the filesystem through the svn_fs_apply_textdelta()
> + interface is cached in memory until the end of the data stream, or
> + until a size trigger is hit. Define that trigger here (in bytes).
> + Setting the value to 0 will result in no filesystem buffering at
> + all. The value only really matters when dealing with file contents
> + bigger than the value itself. Above that point, large values here
> + allow the filesystem to buffer more data in memory before flushing
> + to the database, which increases memory usage but greatly decreases
> + the amount of disk access (and log-file generation) in database.
> + Smaller values will limit your overall memory consumption, but can
> + drastically hurt throughput by necessitating more write operations
> + to the database (which also generates more log-files). */
> +#define SVN_FS_WRITE_BUFFER_SIZE 4096000

Euh... how about selecting 1 meg, rather than 4?

>...
> +static svn_error_t *
> +write_to_string (void *baton, const char *data, apr_size_t *len)
> +{
> + txdelta_baton_t *tb = (txdelta_baton_t *) baton;
> +
> + svn_stringbuf_appendbytes (tb->target_string, data, *len);
> +
> + return SVN_NO_ERROR;

How many times does this get called? In other words, how many times were we
modifying the record? I think it would be good to know this. Also, what the
sizes are. Will the buffering *truly* help us, or was this change just a
guess?

[ I *really* believe in measuring, rather than assuming; not saying that
  happened here, but I didn't see measurements, so I don't know that it did ]

>...
> + if ((! window) || (tb->target_string->len > SVN_FS_WRITE_BUFFER_SIZE))
> + {
> + apr_size_t len = tb->target_string->len;
> + svn_stream_write (tb->target_stream,
> + tb->target_string->data,
> + &len);
> + svn_stringbuf_set (tb->target_string, "");

svn_stringbuf_setempty() is a bit more optimal.

All that said, looking at switching the 'strings' table to a BTree with dup
keys (meaning, it has [consecutive] multiple records) could be a win. Random
access will be tougher, but that happens *very* rarely. Usually, we sequence
through it. But even if we *did* need random access, it is possible to fetch
just the *length* of a record, determine whether you need content from it,
or to skip to the next record.

Understanding the question above: how often does the delta code call the
stream->write function will tell us more information about the buffering. My
guess is that it depends heavily upon the incoming delta. If we're talking
about a pure file upload, then we'll have a series of large windows and
writes. But if we're talking about a serious diff, then we could have a
whole bunch of little writes as the file is reconstructed.

I'd say that a buffer is good, but we could probably reduce the size to 100k
or something, and use duplicate keys to improve the writing perf.

Are you up for it Mike? :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:10 2006

This is an archived mail posted to the Subversion Dev mailing list.