Re: [PATCH] Compressed streams (take 2)

From: Eric Dorland <eric.dorland_at_mail.mcgill.ca>
Date: 2003-04-01 02:05:40 CEST

Hi Philip,

* Philip Martin (philip@codematters.co.uk) wrote:
> Eric Dorland <eric.dorland@mail.mcgill.ca> writes:
>
> > * Philip Martin (philip@codematters.co.uk) wrote:
> > > I agree, we just want to test the Subversion code. However, rather
> > > than just hardcoding a chunk of data, I would use a simple algorithm
> > > to generate some bytes. It doesn't really matter what you use, a
> > > simple 0, 1, 2, ..., 253, 254, 255, 245, 253, ... 3, 2, 1, 0 repeated
> > > would probably do, or how about 256 bytes incrementing in steps of 1,
> > > 256 bytes incrementing in steps of 3, 256 bytes incrementing in steps
> > > of 5, ...
> >
> > Ok, how is this better than just the hardcoded data?
>
> I don't know if it is significantly better, it's just the way I would
> have done it.

Ok :) It's probably more flexible that way, but I don't think it
matters that much.

> > If the data I
> > generate the data in too regular a fashion, it's going to compress too
> > well and defeat the purpose of the test.
>
> The test can check the length of the compressed data, if it's too
> short it can do something, fail or generate more data.

Yes, that's true. But again care is needed, because if the generated
data is too repetitive then it will compress too well and defeat the
purpose of the test. But I'll come up with something that works, based
on the size of ZBUFFER_SIZE.

> > > However you do it, the algorithm should generate an amount of data
> > > based on the value of ZBUFFER_SIZE. Is ZBUFFER_SIZE something to do
> > > with the uncompressed data, or the compressed data?
> >
> > It relates to the compressed data. It's the amount of compressed data
> > a read will pull in at a time, whenever a read is done on a compressed
> > stream. I tested the hardcoded data so I know it compresses to
> > something larger than ZBUFFER_SIZE (4096), which was what we wanted to
> > test.
>
> Why did you choose 4096? Is that a page size or something? What
> happens if we decide to use something bigger, your test data may no
> longer be large enough to overflow the buffer. I know nothing about
> the zlib API, is it sensible to hard code this size? How does it
> affect performance?

It was basically arbitrary. I picked something that wasn't too small
and that would likely be the size of a page, or some constant multiple
of the size of a page. I'm sure different sizes would yield different
performance characteristics in different situations (eg a bigger
buffer would be better for larger reads, but penalize smaller ones). I
don't think it really matters too much, since I think the IO system
under Linux and other Unices would cache things enough to make it
pretty irrelevant what I choose (as long as it's not too insane). I
could even make the buffer dynamic in size, with some heuristic based
on the amount of data the user wants to read it. It's probably not
worth it though. But hey, if it gets my patch in, I'll do it :)

-- 
Eric Dorland <eric.dorland@mail.mcgill.ca>
ICQ: #61138586, Jabber: hooty@jabber.com
1024D/16D970C6 097C 4861 9934 27A0 8E1C  2B0A 61E9 8ECF 16D9 70C6
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d- s++: a-- C+++ UL+++ P++ L++ E++ W++ N+ o K- w+ 
O? M++ V-- PS+ PE Y+ PGP++ t++ 5++ X+ R tv++ b+++ DI+ D+ 
G e h! r- y+ 
------END GEEK CODE BLOCK------

application/pgp-signature attachment: stored

Received on Tue Apr 1 02:06:27 2003

This message: [ Message body ]
Next message: ryan_at_netidea.com: "SOLVED: Re: Problem with the svn python bindings"
Previous message: Philip Martin: "Re: hang during tests?"
In reply to: Philip Martin: "Re: [PATCH] Compressed streams (take 2)"
Next in thread: Philip Martin: "Re: [PATCH] Compressed streams (take 2)"
Reply: Philip Martin: "Re: [PATCH] Compressed streams (take 2)"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]