Greg Stein <gstein@lyra.org> writes:
> Euh... how about selecting 1 meg, rather than 4?
Sorry, I selected that number based on actual trials; what seemed
to give me adequete results without a memory usage explosion.
> How many times does this get called? In other words, how many times were we
> modifying the record? I think it would be good to know this. Also, what the
> sizes are. Will the buffering *truly* help us, or was this change just a
> guess?
Not just a guess. Verified by many a trial on my box today.
> [ I *really* believe in measuring, rather than assuming; not saying that
> happened here, but I didn't see measurements, so I don't know that it did ]
I was sharing some numbers on IRC. You wouldn't have seen that
traffic since you weren't connected at the time.
> svn_stringbuf_setempty() is a bit more optimal.
Okey dokey.
> All that said, looking at switching the 'strings' table to a BTree with dup
> keys (meaning, it has [consecutive] multiple records) could be a win. Random
> access will be tougher, but that happens *very* rarely. Usually, we sequence
> through it. But even if we *did* need random access, it is possible to fetch
> just the *length* of a record, determine whether you need content from it,
> or to skip to the next record.
>
> Understanding the question above: how often does the delta code call the
> stream->write function will tell us more information about the buffering. My
> guess is that it depends heavily upon the incoming delta. If we're talking
> about a pure file upload, then we'll have a series of large windows and
> writes. But if we're talking about a serious diff, then we could have a
> whole bunch of little writes as the file is reconstructed.
>
> I'd say that a buffer is good, but we could probably reduce the size to 100k
> or something, and use duplicate keys to improve the writing perf.
>
> Are you up for it Mike? :-)
I have a feeling that the incoming data tends to be no larger than
something near 100k, the size of the svndiff encoding windows. My
tests were on imports, so all the data coming into the filesystem was
svndiff's equivalent of full-text. I'll betcha that those were 100K
windows with one op: NEW (and 102400 bytes of data). The buffering
earns us nothing if it drops to a value smaller than the average size
of a chunk of data written to the filesystem. It needs to float at
a value that is like The Most Memory I Can Stand For the FS to Use.
As for the BTree thing, I don't see the advantage in this case. Sure,
it might help our reads of data (or it might hurt, if getting a range
of text means we have to potentially hit the database more than once
because that range is spread out over multiple records), but the
problems I had today are strictly related to the number of times
record data was written to the database. Perhaps you're forking a new
thread, though, and I'm missing it?
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:10 2006