[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SVNDIFF1 is ready to merge

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: 2005-12-15 03:49:38 CET

On Tue, 2005-12-13 at 16:42 +0000, Julian Foad wrote:
> Daniel Berlin wrote:
> >
> > So take a looksee at the svndiff1 branch.
>
> OK. Now, I have to confess that this is a rather shallow review in terms of
> not thinking hard about the implications or the strategy of the change, but I'm
> trusting that others are doing that. :-)
>
> I've put comments in line, and snipped out chunks for which I had no comments.
> I moved some style nits to the end of the message.
>
>
> > Index: subversion/libsvn_fs_base/fs.c
> > ===================================================================
> > --- subversion/libsvn_fs_base/fs.c (.../trunk) (revision 17620)
> > +++ subversion/libsvn_fs_base/fs.c (.../branches/svndiff1) (revision 17755)
> > @@ -669,6 +676,10 @@
> > static svn_error_t *
> > check_format (int format)
> > {
> > + /* We support format 1 and 2 simultaneously. */
> > + if (format == 1 && SVN_FS_BASE__FORMAT_NUMBER == 2)
> > + return SVN_NO_ERROR;
>
> It doesn't matter whether the current format is 2: there's no reason for this
> line to automatically stop supporting format 1 when we bump the current format
> to 3.

This is not true.

We currently assume we only support one format at a time, everywhere.
We have no policy that says we will keep code to support old formats
around, AFAIK.

The only real reason we support format 1 and 2 at the same time in
anything but the dump and loader is because it's a one line of code to
do so. It's conceivable that if the changes people want to make (dir
hashing, etc) go in during 1.4, that we won't actually support tons of
formats at once, because it's no longer one line of code.

Unless we have a policy around i've missed about supporting multiple
formats at of course, in which case, the only check we should be doing
in this function is that the format isn't greater than
SVN_FS_BASE__FORMAT_NUMBER, and not the *current* check we do that
> if (format != SVN_FS_BASE__FORMAT_NUMBER)

> What matters is whether we have the code to support format 1, and we do,
> so this only needs to say "if (format == 1)". Or it might be clearer to change
> the following "if" to
>
> if (format < 1 || format > SVN_FS_BASE__FORMAT_NUMBER)
>
> which implies that we support a contiguous range of formats, which is a
> reasonable way of thinking about it.
>
> > +
> > if (format != SVN_FS_BASE__FORMAT_NUMBER)
> > {
> > return svn_error_createf
> > (SVN_ERR_FS_UNSUPPORTED_FORMAT, NULL,
> > _("Expected FS format '%d'; found format '%d'"),
> > SVN_FS_BASE__FORMAT_NUMBER, format);
>
> In any case, the error message generated here is no longer as accurate as it
> could be, suggesting only one format number is "expected". Not that it
> necessarily has to be improved, but it would be nice.
>
> > Index: subversion/include/svn_error_codes.h
> > ===================================================================
> > @@ -819,6 +819,15 @@
> > SVN_ERRDEF (SVN_ERR_SVNDIFF_UNEXPECTED_END,
> > SVN_ERR_SVNDIFF_CATEGORY_START + 4,
> > "Svndiff data ends unexpectedly")
> > +
> > + SVN_ERRDEF (SVN_ERR_SVNDIFF_INVALID_VERSION,
> > + SVN_ERR_SVNDIFF_CATEGORY_START + 5,
> > + "Svndiff version greater than known max")
>
> That error code doesn't seem to be used.

Thanks for reminding me, i'll put the check back in the txdelta2
function.

>
> > +
> > + SVN_ERRDEF (SVN_ERR_SVNDIFF_INVALID_COMPRESSED_DATA,
> > + SVN_ERR_SVNDIFF_CATEGORY_START + 6,
> > + "Svndiff compressed data is invalid")
> > +
> >
> > Index: subversion/include/svn_delta.h
> > ===================================================================
> > @@ -242,6 +242,16 @@
> > svn_txdelta_stream_t *stream,
> > apr_pool_t *pool);
> >
> > +/** Set @a *window to a pointer to the next window from the delta stream
> > + * @a stream. When we have completely reconstructed the target string,
> > + * set @a *window to zero.
> > + *
> > + * The window will be allocated in @a pool.
> > + */
> > +svn_error_t *svn_txdelta_next_window (svn_txdelta_window_t **window,
> > + svn_txdelta_stream_t *stream,
> > + apr_pool_t *pool);
> > +
>
> Oops - you accidentally added this second copy of a prototype that already
> exists in this file.

Fixed.

>
> >
> > /** Return the @a md5 digest for the complete fulltext deltified by
> > * @a stream, or @c NULL if @a stream has not yet returned its final
> > @@ -355,6 +365,19 @@
> > * @a output is a writable generic stream to write the svndiff data to.
> > * Allocation takes place in a sub-pool of @a pool. On return, @a *handler
> > * is set to a window handler function and @a *handler_baton is set to
> > + * the value to pass as the @a baton argument to @a *handler. The svndiff
> > + * version is @a version.
> > + */
> > +
> > +void svn_txdelta_to_svndiff2 (svn_stream_t *output,
> > + apr_pool_t *pool,
> > + svn_txdelta_window_handler_t *handler,
> > + void **handler_baton, unsigned int version);
>
> It's better not to use modifiers like "unsigned" in APIs, even though it feels
> like it's making a useful statement about the valid range of the data, as it
> tends to do more harm than good. Please use plain "int" here.

Can you please explain why?

In particular,
1. I don't believe specifying the proper range of values does more harm
than good.
2. It *is* making a useful statement about the valid range of data.

I actually don't care about changing it, i just want to know why you
think it does more harm than good (particularly when signed and unsigned
types have very different semantics in C)

>
> > +
> > +/** Prepare to produce an svndiff-format diff from text delta windows.
> > + * @a output is a writable generic stream to write the svndiff data to.
> > + * Allocation takes place in a sub-pool of @a pool. On return, @a *handler
> > + * is set to a window handler function and @a *handler_baton is set to
> > * the value to pass as the @a baton argument to @a *handler.
> > */
> > void svn_txdelta_to_svndiff (svn_stream_t *output,
>
> Deprecate svn_txdelta_to_svndiff() by changing the description to something
> like "Similar to svn_txdelta_to_svndiff(), but always using svndiff version 0."
> and adding a line "@deprecated Provided for backward compatibility with the 1.3
> API." (verbatim).

Fixed.

>
>
> > Index: subversion/mod_dav_svn/repos.c
> > ===================================================================
> > @@ -1241,6 +1242,182 @@
> > return NULL;
> > }
> >
> > +/* --------------- Borrowed from httpd's mod_negotiation.c -------------- */
> > +
> > +typedef struct accept_rec {
> > + char *name; /* MUST be lowercase */
> > + float quality;
> > +} accept_rec;
> > +
> > +/*
> > + * Get a single mime type entry --- one media type and parameters;
> > + * enter the values we recognize into the argument accept_rec
> > + */
>
> What has this got to do with getting a MIME type entry? Even if this is copied
> verbatim from httpd, the comment needs to be clarified so as not to confuse me.

The whole thing is cribbed from the l10n branch, which in turn was
cribbed from httpd's mod_negotiation, or so i was told.

>
> > +
> > +static const char *get_entry(apr_pool_t *p, accept_rec *result,
> > + const char *accept_line)
>
> I trust I can ignore the implementation, though, on the basis that it's already
> tested.

Yes. Although i think it's grody as heck, i don't like tinkering with
mod_dav_svn, so i only made the minimum changes necessary. :(

>
> > +{
> > + result->quality = 1.0f;
> > +
> > + /*
> > + * Note that this handles what I gather is the "old format",
> [...]
> > + return accept_line;
> > +}
> > +
> > +/* @a accept_line is the Accept-Encoding header, which is of the
> > + format:
> > +
> > + Accept-Language: name; q=N;
> > +*/
>
> "Accept-Language" -> "Accept-Encoding".

Yup.

>
> Please could the doc string also say what this function does: what is the array
> that it returns?

>
> > +static apr_array_header_t *do_header_line(apr_pool_t *p,
> > + const char *accept_line)
> [...]
> > +/* ---------------------------------------------------------------------- */
> > +
> > +
> > +/* qsort implementation for the quality field of the accept_rec
>
> This isn't a qsort implementation. Say "qsort comparison function" or
> something like that instead.

>
> > + structure */
> > +static int sort_encoding_pref(const void *accept_rec1, const void *accept_rec2)
> > +{
> > + float diff = ((const accept_rec *) accept_rec1)->quality -
> > + ((const accept_rec *) accept_rec2)->quality;
> > + return (diff == 0 ? 0 : (diff > 0 ? -1 : 1));
> > +}
> > +
> > +/* It would be nice if this function could be unit-tested. Paul
> > + Querna suggested
> > + http://svn.apache.org/repos/asf/httpd/httpd/trunk/server/request.c's
> > + make_sub_request(), and noted that httpd manually constructs
> > + request_rec's in a few spots. */
>
> Please could the doc string first say what the function does.
>
> > +
> > +static void
> > +svn_dav__negotiate_encoding_prefs(request_rec *r,
> > + int *svndiff_version)
> > +{
> > + /* It would be nice if mod_negotiation
> > + <http://httpd.apache.org/docs-2.1/mod/mod_negotiation.html> could
> > + handle the Accept-Encoding header parsing for us. Sadly, it
> > + looks like its data structures and routines are private (see
> > + httpd/modules/mappers/mod_negotiation.c). Thus, we duplicate the
> > + necessary ones in this file. */
> > + size_t i;
> > + const apr_array_header_t *encoding_prefs;
> > + encoding_prefs = do_header_line(r->pool,
> > + apr_table_get(r->headers_in,
> > + "Accept-Encoding"));
> > +
> > + if (encoding_prefs && apr_is_empty_array(encoding_prefs))
>
> I think that should be:
>
> if (!encoding_prefs || apr_is_empty_array(encoding_prefs))
>
> otherwise the code will go on to call qsort on the null pointer.

fixed.

>
> > + {
> > + *svndiff_version = 0;
> > + return;
> > + }
> > + *svndiff_version = 0;
> > + qsort(encoding_prefs->elts, (size_t) encoding_prefs->nelts,
> > + sizeof(accept_rec), sort_encoding_pref);
> > + for (i = 0; i < encoding_prefs->nelts; i++)
> > + {
> > + struct accept_rec rec = APR_ARRAY_IDX (encoding_prefs, i,
> > + struct accept_rec);

> Indentation. (Sorry :-)

You'll note the indentation and style is wrong on a lot of this code.

I asked whether i should fix it, and was told as long as it matches the
surrounding code (which it sadly does), to leave it be.

>
> > + if (strcmp (rec.name, "svndiff1") == 0)
> > + {
> > + *svndiff_version = 1;
> > + break;
> > + }
> > + else if (strcmp (rec.name, "svndiff") == 0)
> > + {
> > + *svndiff_version = 0;
> > + break;
> > + }
> > + }
> > +}
> >
> >
> > static dav_error * dav_svn_get_resource(request_rec *r,
> > @@ -1336,6 +1513,7 @@
> > && strcmp(ct, SVN_SVNDIFF_MIME_TYPE) == 0;
> > }
> >
> > + svn_dav__negotiate_encoding_prefs (r, &comb->priv.svndiff_version);
>
> A blank line would be good here, since this doesn't appear to be logically
> connected to the following line.
>
> > /* ### and another hack for computing diffs to send to the client */
> > comb->priv.delta_base = apr_table_get(r->headers_in,
> > SVN_DAV_DELTA_BASE_HEADER);
> > @@ -2627,7 +2805,8 @@
> > svn_stream_set_close(o_stream, dav_svn_close_filter);
> >
> > /* get a handler/baton for writing into the output stream */
> > - svn_txdelta_to_svndiff(o_stream, resource->pool, &handler, &h_baton);
> > + svn_txdelta_to_svndiff2(o_stream, resource->pool, &handler, &h_baton,
> > + resource->info->svndiff_version);
> >
> > /* got everything set up. read in delta windows and shove them into
> > the handler, which pushes data into the output stream, which goes
> > Index: subversion/libsvn_delta/svndiff.c
> > ===================================================================
> > @@ -24,7 +24,11 @@
> > #include "delta.h"
> > #include "svn_pools.h"
> > #include "svn_private_config.h"
> > +#include <zlib.h>
> >
> > +/* For svndiff1, address/instruction/new data under this size will not
> > + be compressed using zlib as a secondary compressor. */
> > +#define MIN_COMPRESS_SIZE 512
> > #define NORMAL_BITS 7
> > #define LENGTH_BITS 5
> >
> > @@ -37,10 +41,10 @@
> > struct encoder_baton {
> > svn_stream_t *output;
> > svn_boolean_t header_done;
> > + unsigned int version;
>
> (This isn't an API so it doesn't matter much, but it might be neater to use
> plain "int" to be consistent with the Subversion API.)
>
> > apr_pool_t *pool;
> > };
> >
> > -
> > /* Encode VAL into the buffer P using the variable-length svndiff
> > integer format. Return the incremented value of P after the
> > encoded bytes have been written.
> > @@ -99,6 +103,41 @@
> > svn_stringbuf_appendbytes (header, buf, p - buf);
> > }
> >
> > +static svn_error_t *
> > +zlib_encode (svn_stringbuf_t *in, svn_stringbuf_t *out)
>
> Doc string, please, even if it seems obvious what it does. It's not obvious:
> e.g. is it intended to be able to append to a non-empty 'out' string?
>
> > +{
> > + unsigned long endlen;
> > + unsigned int intlen;
>
> Looks like these are only needed in the "else" clause below.
>
> > +
> > + append_encoded_int (out, in->len, NULL);
>
> Oof! I don't like to see NULL for 'pool'. You'll have observed that the
> current implementation of the function doesn't use the pool, and that means
> (since it's local) that we can get rid of that parameter. Done in r17756.
>
> > + intlen = out->len;
> > +
> > + if (in->len < MIN_COMPRESS_SIZE)
> > + {
> > + svn_stringbuf_appendstr (out, in);
> > + }
> > + else
> > + {
> > + svn_stringbuf_ensure (out, compressBound (in->len) + intlen);
> > + endlen = out->blocksize;
>
> Should be "endlen = out->blocksize - intlen", or "endlen = compressBound
> (in->len)" and then use it in the previous line.

>
> > +
> > + if (compress2 ((unsigned char *)out->data + intlen, &endlen,
> > + (const unsigned char *)in->data, in->len, 5) != Z_OK)
>
> The constant '5' should probably have a name - SVNDIFF1_COMPRESS_LEVEL?

Sure.
>
> > +
> > + return svn_error_create (SVN_ERR_SVNDIFF_INVALID_COMPRESSED_DATA,
> > + NULL,
> > + _("compression of svndiff data failed"));
> > +
> > + /* Compression didn't help :(, just append the original text */
> > + if (endlen > in->len)
> > + {
> > + svn_stringbuf_appendstr (out, in);
> > + return SVN_NO_ERROR;
> > + }
> > + out->len = endlen + intlen;
> > + }
> > + return SVN_NO_ERROR;
> > +}
> >
> > static svn_error_t *
> > window_handler (svn_txdelta_window_t *window, void *baton)
> > @@ -106,17 +145,23 @@
> > struct encoder_baton *eb = baton;
> > apr_pool_t *pool = svn_pool_create (eb->pool);
> > svn_stringbuf_t *instructions = svn_stringbuf_create ("", pool);
> > + svn_stringbuf_t *i1 = svn_stringbuf_create ("", pool);
> > svn_stringbuf_t *header = svn_stringbuf_create ("", pool);
> > + svn_stringbuf_t *newdata = svn_stringbuf_create ("", pool);
> > char ibuf[128], *ip;
> > + char abuf[128], *ap;
> > const svn_txdelta_op_t *op;
> > - svn_error_t *err;
> > apr_size_t len;
> > + apr_size_t lastoffset = 0;
> > + apr_size_t bits = 0;
> >
> > /* Make sure we write the header. */
> > if (eb->header_done == FALSE)
> > {
> > + char svnver[4] = "SVN\0";
> > len = 4;
> > - SVN_ERR (svn_stream_write (eb->output, "SVN\0", &len));
> > + svnver[3] = eb->version;
> > + SVN_ERR (svn_stream_write (eb->output, svnver, &len));
> > eb->header_done = TRUE;
> > }
> >
> > @@ -146,6 +191,7 @@
> > {
> > /* Encode the action code and length. */
> > ip = ibuf;
> > + ap = abuf;
> > switch (op->action_code)
> > {
> > case svn_txdelta_source: *ip = (char)0; break;
> > @@ -156,8 +202,11 @@
> > *ip++ |= op->length;
> > else
> > ip = encode_int (ip + 1, op->length);
> > +
>
> OK.
>
> > if (op->action_code != svn_txdelta_new)
> > - ip = encode_int (ip, op->offset);
> > + {
> > + ip = encode_int (ip, op->offset);
> > + }
>
> Indentation. Come to think of it, it looks like a spurious change where you
> probably put more statements in and later removed them, so just return it to
> how it was.

Yes, it is, i've reverted it.

>
> > svn_stringbuf_appendbytes (instructions, ibuf, ip - ibuf);
> > }
> >
> > @@ -165,25 +214,58 @@
> > append_encoded_int (header, window->sview_offset, pool);
> > append_encoded_int (header, window->sview_len, pool);
> > append_encoded_int (header, window->tview_len, pool);
> > + if (eb->version == 1)
> > + {
> > + SVN_ERR (zlib_encode (instructions, i1));
> > + instructions = i1;
> > + }
> > append_encoded_int (header, instructions->len, pool);
> > + if (eb->version == 1)
> > + {
> > + svn_stringbuf_t *temp;
> > + temp = svn_stringbuf_create_from_string (window->new_data, pool);
> > + SVN_ERR (zlib_encode (temp, newdata));
> > + window->new_data = svn_string_create_from_buf (newdata, pool);
> > + }
> > +
>
> Unwanted blank line?
yup.

>
> > append_encoded_int (header, window->new_data->len, pool);
> >
> > /* Write out the window. */
> > len = header->len;
> > - err = svn_stream_write (eb->output, header->data, &len);
> > - if (err == SVN_NO_ERROR && instructions->len > 0)
> > + SVN_ERR (svn_stream_write (eb->output, header->data, &len));
> > + if (instructions->len > 0)
> > {
> > len = instructions->len;
> > - err = svn_stream_write (eb->output, instructions->data, &len);
> > + SVN_ERR (svn_stream_write (eb->output, instructions->data, &len));
> > }
> > - if (err == SVN_NO_ERROR && window->new_data->len > 0)
> > + if (window->new_data->len > 0)
> > {
> > len = window->new_data->len;
> > - err = svn_stream_write (eb->output, window->new_data->data, &len);
> > + SVN_ERR (svn_stream_write (eb->output, window->new_data->data, &len));
> > }
> >
> > svn_pool_destroy (pool);
> > - return err;
> > + return SVN_NO_ERROR;
> > +}
> > +
> > +void
> > +svn_txdelta_to_svndiff2 (svn_stream_t *output,
> > + apr_pool_t *pool,
> > + svn_txdelta_window_handler_t *handler,
> > + void **handler_baton,
> > + unsigned int version)
> > +{
> > + apr_pool_t *subpool = svn_pool_create (pool);
> > + struct encoder_baton *eb;
> > +
> > + eb = apr_palloc (subpool, sizeof (*eb));
> > + eb->output = output;
> > + eb->header_done = FALSE;
> > + eb->pool = subpool;
> > + eb->version = version;
> > +
> > + *handler = window_handler;
> > + *handler_baton = eb;
> > }
> >
> > void
> > @@ -199,6 +281,7 @@
> > eb->output = output;
> > eb->header_done = FALSE;
> > eb->pool = subpool;
> > + eb->version = 0;
> >
> > *handler = window_handler;
> > *handler_baton = eb;
>
> To avoid code duplication, reimplement this svn_txdelta_to_svndiff() as a
> simple forwarding call to svn_txdelta_to_svndiff2(..., 0).

done.

>
> > @@ -240,6 +323,10 @@
> > not transmit the whole svndiff data stream, you will want this to
> > be FALSE. */
> > svn_boolean_t error_on_early_close;
> > +
> > + /* svndiff version in use by delta. */
> > + unsigned char version;
> > +
> > };
> >
> >
> > @@ -283,6 +370,48 @@
> > return NULL;
> > }
> >
> > +static svn_error_t *
> > +zlib_decode (svn_stringbuf_t *in, svn_stringbuf_t *out)
>
> Doc string, please.
done.

>
> > +{
> > + apr_size_t len;
> > + unsigned long zliblen;
> > + unsigned long origlen;
>
> Those two are only used in the "else" clause so should be there.
done.

>
> > + char *oldplace = in->data;
> > +
> > + assert (sizeof (zliblen) >= sizeof (len));
>
> I don't think that's a reasonable assertion: (a) I don't think there's any
> reason to believe a window could be too long for "unsigned long", and (b) we're
> not in the habit of asserting that our data types are size-compatible.

Fine.

>
> > + /* First thing in the string is the original length. */
> > + in->data = (char *)decode_size (&len, (unsigned char *)in->data,
> > + (unsigned char *)in->data+in->len);
> > + /* We need to subtract the size of the encoded original length off the
> > + * still remaining input length. */
> > + in->len -= (in->data - oldplace);
> > + if (in->len == len)
> > + {
> > + svn_stringbuf_appendstr (out, in);
>
> I wonder if we could easily avoid copying all the data here. It seems a shame
> to be doing so. Maybe change this function's interface to output a pointer to
> the "out" string, so that it can output the "in" string without copying it.

IMHO, it's not worth it right now. I'd wait till it was a problem to
muck up the code so it has to do it's own allocation in the common case,
so that it can avoid copying in the *uncommon* case.

> (It would be more difficult for zlib_encode() because that would need to extend
> the beginning of the string, whereas this only needs to contract it.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Dec 15 03:50:19 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.