[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] duplicate keys for the 'strings' table

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-02-27 22:30:30 CET

On Wed, Feb 27, 2002 at 02:17:08PM -0600, cmpilato@collab.net wrote:
> Branko =?UTF-8?B?xIxpYmVq?= <brane@xbc.nu> writes:
> > cmpilato@collab.net wrote:
> > >I would suggest rolling back tree.c to 1279 and removing the #define
> > >from svn_fs.h (effectively removing the buffering code altogether),
> > >and therefore the filesystem is never trying to guess at best-case
> > >buffering behavior (which it can't possible know). If the clients are
> > >writing directly into the strings table, they can (theoretically)
> > >choose to send differently size chunks of data into the window
> > >consumer returned by svn_fs_apply_textdelta(), and therefore have full
> > >control over their own performance in this area!
> >
> > Wouldn't it make sense to at least buffer the writes to a multiple of
> > the db page size?

I think so.

> Well, this would add more complexity to the code. As is, the buffer
> limit is really just a trigger: if data_size > buffer_limit, then
> dump everything we've read to the database. That's different than
> dumping buffer_size to data, memmoving the leftover, etc.

Not much complexity, really.

Although, thinking about it some more while driving, we should put the
"flush" right into the stream write callback, rather than outside the window
processor.

> > Or make the buffer the same size as the currently hard-coded delta
> > windows?
>
> No sense in that. Right now, we can pretty much guarantee that the
> incoming data is coming in chunks of roughly-delta-window-size or
> less, where less means we're at the end of a given file's data. Might
> as well turn off the buffering altogether, I think.

I don't think that we can guarantee that.

When the delta processor writes to the stream, doesn't it sometimes write
small pieces? e.g. write as it goes?

[ just chatted with mike on irc, and looked at some code ]

Okay, it looks like the delta processor buffers up a chunk of output before
delivering to the output stream. So, theoretically, we'll probably only see
larger (window-sized) chunks in the FS stream handler.

Above, I mentioned putting the flush right into the stream callback. I think
this is important because we can optimize our buffering logic much better if
that is the case. Consider the write callback doing something like:

  write_to_string (baton, data, len)
  {
      if (len(buffer) == 0 and len > 100k)
          string_append(data, len)
      else {
          buffer.append(data, len)
          if (len(buffer) > 100k) {
              string_append(buf->data, buf->len)
              set_empty(buf)
          }
      }
  }

i.e. it gets a bit easier to encode buffering logic a little better, by
locating it in one spot.

Also, have a buffered stream to write to a string record allows us to write
an FS interface function for writing a fulltext. Right now, when mod_dav_svn
gets a fulltext for a new file, it has to create diff windows (which copy
none of the source) and feed those into the handler from apply_textdelta. It
would be much nicer to just feed that write into a stream.

Another alternative is to have the stream returned by dag_get_edit_stream()
to do the buffering itself. In fact, the precise point is in
reps-strings.c::rep_write_contents(). It can buffer, or it can start a trail
to write contents.

Hmm. I kind of like the idea of putting it down into reps-strings.c. That
isolates all higher levels from the concerns of "how" to write to that
stream. It also means we can remove the buffering from tree.c (yes, it just
moves, but I think we want it down there anyways; as I said -- to support a
direct writable stream API in the FS).

The last issue is that I'd rather see 10 200k records in the database, then
2000 1k records. Buffering a bit would keep our need for iteration much
lower. And note that the small records propagates "out", too. When you do an
svn_fs__string_read(), the most you'll get is a single record. That means if
you have little 1k records in the DB, then you'll only be able to read out
1k at a time.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:10 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.