This patch differs from the previous one in a few ways.
1. The commenting isn't running commentary anymore, i actually went back
and wrote the comments (for most places, anyway).
2. Address data is now seperated out into a seperate section, like
instructions and new data is. Address data is, on average, 2x the size of
the instructions, and always dominates the size of the deltas. Seperating
it out allows us to compress it better (i'll get to this in a moment).
3. The version number is now stored properly in the DB. I put it between
the svndiff marker and the offset. It didn't seem right to put it near
delta, since the version is a property of svndiff.
This is done in a backwards compatible way, such that if it's missing, we
assume it's version 0.
Because the svndiff header appears per-window, i had to move where we send
the header through down into the loop, and we just take care to avoid
sending it more than once.
The advantage of this is that we can (but don't yet) check to make sure we
don't have mixed versions of svndiff data in one delta, and do something smart if
we do.
4. We now range encode (almost-arithmetic encoding, but not patented) the
instructions/new data/address data. This buys us quite a bit for almost
no cost. We now beat gzip consistently. We actually usually beat bzip2 at
the same blocksize as our SVN_STREAM_CHUNK_SIZE. For those curious why i
didn't take a Public Domain implementation of adaptive huffman or
something, it's because range encoding is general .1% worse than arithmetic encoding, but
twice as fast. By comparison, adaptive huffman is notoriously slow in
practice, and won't beat range encoding anyway (since huffman uses an
integral number of bits). The implementation is a modified version of a
public domain implementation. I specifically chose range encoding
because it's not patented (and it was introduced in 197x),
like arithmetic encoding is, and because it's fast.
Profiling shows the range encoding is less than 2% of the time spent.
Fer instance, on a 15 meg changelog svndiff'd against an empty file,
vdelta takes 19 seconds, and the range encoder .68 seconds.
For those who think this is overkill, vcodex compresses the
instructions/data/new data as well.
There are still a few things I need to do before i'd like to see this
officially reviewed, i'm just posting it here so it's here, more than
anything else.
:)
--Dan
Index: ./subversion/include/svn_delta.h
===================================================================
--- ./subversion/include/svn_delta.h
+++ ./subversion/include/svn_delta.h Mon Feb 25 01:38:41 2002
@@ -242,7 +242,7 @@
void svn_txdelta_to_svndiff (svn_stream_t *output,
apr_pool_t *pool,
svn_txdelta_window_handler_t *handler,
- void **handler_baton);
+ void **handler_baton, unsigned char version);
/* Return a writable generic stream which will parse svndiff-format
data into a text delta, invoking HANDLER with HANDLER_BATON
Index: ./subversion/libsvn_fs/reps-strings.c
===================================================================
--- ./subversion/libsvn_fs/reps-strings.c
+++ ./subversion/libsvn_fs/reps-strings.c Mon Feb 18 18:20:39 2002
@@ -147,6 +147,7 @@
delta_string_keys (apr_array_header_t **keys, skel_t *rep, apr_pool_t *pool)
{
const char *key;
+ unsigned char version;
skel_t *window = rep->children->next;
int num_windows = 0;
@@ -173,7 +174,12 @@
return svn_error_create
(SVN_ERR_FS_CORRUPT, 0, NULL, pool,
"string_key: delta rep uses unknown diff format");
-
+ /* If it has a version number, ignore it for now. */
+ if (diff->next->next != NULL)
+ {
+ version = diff->next->data[0];
+ diff = diff->next;
+ }
key = apr_pstrndup (pool, diff->next->data, diff->next->len);
(*((const char **)(apr_array_push (*keys)))) = key;
window = window->next;
@@ -547,6 +553,8 @@
char chunk[4096]; /* chunk of svndiff data */
apr_size_t off; /* offset into svndiff data */
apr_size_t amt; /* how much svndiff data to/was read */
+ unsigned char version = 0;
+ int wrote_header = 0;
/* Initialize THIS_WINDOW to the first (OFFSET WINDOW) skel. */
this_window = rep->children->next;
@@ -565,16 +573,6 @@
/* Set up a window handling stream for the svndiff data. */
wstream = svn_txdelta_parse_svndiff (window_handler, &wb,
FALSE, subpool);
-
- /* First things first: send the "SVN\0" header through the
- stream. */
- chunk[0] = 'S';
- chunk[1] = 'V';
- chunk[2] = 'N';
- chunk[3] = '\0';
- amt = 4;
- SVN_ERR (svn_stream_write (wstream, chunk, &amt));
-
/* Now, for each window, decide if the window is relevant. That
is, do we need to use to reconstruct data in the range
requested by the caller? */
@@ -606,12 +604,23 @@
we're definitely done. */
if (this_off > (offset + *len))
break;
-
- /* Get this string key which holds this window's data.
- ### todo: make sure this is an `svndiff' DIFF skel here. */
- str_key = apr_pstrndup (subpool,
- wnd_skel->children->children->next->data,
- wnd_skel->children->children->next->len);
+ if (wnd_skel->children->children->next->next != NULL)
+ {
+ version = wnd_skel->children->children->next->data[0];
+ /* Get this string key which holds this window's data.
+ ### todo: make sure this is an `svndiff' DIFF skel here. */
+ str_key = apr_pstrndup (subpool,
+ wnd_skel->children->children->next->next->data,
+ wnd_skel->children->children->next->next->len);
+ }
+ else
+ {
+ version = 0;
+ str_key = apr_pstrndup (subpool,
+ wnd_skel->children->children->next->data,
+ wnd_skel->children->children->next->len);
+ }
+
/* Finish initializing our baton with window-specific
stuff. */
@@ -623,6 +632,18 @@
/* Run through the svndiff data, at least as far as necessary. */
off = 0;
+ if (!wrote_header)
+ {
+ /* First things first: send the "SVN<version>" header through the
+ stream. */
+ chunk[0] = 'S';
+ chunk[1] = 'V';
+ chunk[2] = 'N';
+ chunk[3] = version;
+ amt = 4;
+ SVN_ERR (svn_stream_write (wstream, chunk, &amt));
+ wrote_header = 1;
+ }
do {
amt = sizeof (chunk);
SVN_ERR (svn_fs__string_read (fs, str_key, chunk,
@@ -1399,6 +1420,9 @@
/* pool for holding the windows */
apr_pool_t *wpool;
+ /* svndiff version to use */
+ unsigned char svndiff_ver = 1;
+
/* Paranoia: never allow a rep to be deltified against itself,
because then there would be no fulltext reachable in the delta
chain, and badness would ensue. */
@@ -1439,7 +1463,7 @@
/* Setup a stream to convert the textdelta data into svndiff windows. */
svn_txdelta (&txdelta_stream, source_stream, target_stream, trail->pool);
svn_txdelta_to_svndiff (new_target_stream, trail->pool,
- &new_target_handler, &new_target_handler_baton);
+ &new_target_handler, &new_target_handler_baton, svndiff_ver);
/* subpool for the windows */
wpool = svn_pool_create (trail->pool);
@@ -1564,6 +1588,7 @@
/* The diff. */
svn_fs__prepend (svn_fs__str_atom (ww->key, trail->pool), diff);
+ svn_fs__prepend (svn_fs__mem_atom (&svndiff_ver, 1, trail->pool), diff);
svn_fs__prepend (svn_fs__str_atom ("svndiff", trail->pool), diff);
/* The checksum. */
Index: ./subversion/mod_dav_svn/repos.c
===================================================================
--- ./subversion/mod_dav_svn/repos.c
+++ ./subversion/mod_dav_svn/repos.c Wed Feb 20 16:40:56 2002
@@ -1672,7 +1672,7 @@
svn_stream_set_close(o_stream, dav_svn_close_filter);
/* get a handler/baton for writing into the output stream */
- svn_txdelta_to_svndiff(o_stream, resource->pool, &handler, &h_baton);
+ svn_txdelta_to_svndiff(o_stream, resource->pool, &handler, &h_baton, 0);
/* got everything set up. read in delta windows and shove them into
the handler, which pushes data into the output stream, which goes
Index: ./subversion/tests/libsvn_delta/svndiff-test.c
===================================================================
--- ./subversion/tests/libsvn_delta/svndiff-test.c
+++ ./subversion/tests/libsvn_delta/svndiff-test.c Tue Feb 19 17:36:06 2002
@@ -55,7 +55,7 @@
#else
encoder = svn_base64_encode (svn_stream_from_stdio (stdout, pool), pool);
#endif
- svn_txdelta_to_svndiff (encoder, pool, &svndiff_handler, &svndiff_baton);
+ svn_txdelta_to_svndiff (encoder, pool, &svndiff_handler, &svndiff_baton, 1);
svn_txdelta_send_txstream (txdelta_stream,
svndiff_handler,
svndiff_baton,
Index: ./subversion/tests/libsvn_delta/delta-combine-test.c
===================================================================
--- ./subversion/tests/libsvn_delta/delta-combine-test.c
+++ ./subversion/tests/libsvn_delta/delta-combine-test.c Mon Feb 18 18:29:44 2002
@@ -117,7 +117,7 @@
/* Note that we want our txdelta's converted to svndiff data,
and sent to OUTSTREAM. */
svn_txdelta_to_svndiff (out_stream, pool,
- &svndiff_handler, &svndiff_baton);
+ &svndiff_handler, &svndiff_baton, 1);
/* Now do the conversion. */
svn_txdelta_send_txstream (txdelta_stream, svndiff_handler,
Index: ./subversion/tests/libsvn_delta/random-test.c
===================================================================
--- ./subversion/tests/libsvn_delta/random-test.c
+++ ./subversion/tests/libsvn_delta/random-test.c Tue Feb 19 13:35:54 2002
@@ -26,7 +26,7 @@
#include "svn_pools.h"
#include "svn_error.h"
-#define DEFAULT_ITERATIONS 30
+#define DEFAULT_ITERATIONS 50
#define DEFAULT_MAXLEN (100 * 1024)
#define SEEDS 50
#define MAXSEQ 100
@@ -205,7 +205,7 @@
delta_pool);
/* Make stage 2: encode the text delta in svndiff format. */
- svn_txdelta_to_svndiff (stream, delta_pool, &handler, &handler_baton);
+ svn_txdelta_to_svndiff (stream, delta_pool, &handler, &handler_baton, 1);
/* Make stage 1: create the text delta. */
svn_txdelta (&txdelta_stream,
Index: ./subversion/libsvn_delta/svndiff.c
===================================================================
--- ./subversion/libsvn_delta/svndiff.c
+++ ./subversion/libsvn_delta/svndiff.c Mon Feb 25 18:07:06 2002
@@ -1,3 +1,4 @@
+
/*
* svndiff.c -- Encoding and decoding svndiff-format deltas.
*
@@ -29,6 +30,30 @@
/* ----- Text delta to svndiff ----- */
+/* Structure of the svndiff v1 cache.
+ It contains a near table, a same table, a pool, and a decode start
+ position.
+
+ The near table is a table of the last n addresses to be process,
+ excluding duplicates.
+
+ The same table is a hash table with an always overwrite policy.
+ It has 256 * same_size entries, and the hash key is simply the
+ address % number of entries.
+
+ The pos is the current start position while decoding. */
+
+#define SVNDIFF1_NEAR_SIZE 5
+#define SVNDIFF1_SAME_SIZE 1
+struct svndiff_cache
+{
+ apr_off_t near[SVNDIFF1_NEAR_SIZE];
+ int near_index;
+
+ apr_off_t same[SVNDIFF1_SAME_SIZE * 256];
+
+ apr_off_t pos;
+};
/* We make one of these and get it passed back to us in calls to the
window handler. We only use it to record the write function and
@@ -36,9 +61,66 @@
struct encoder_baton {
svn_stream_t *output;
svn_boolean_t header_done;
- apr_pool_t *pool;
+ unsigned char version;
+ apr_pool_t *pool;
+ struct svndiff_cache *cache;
};
+/* Encoding modes for svndiff v1. The whole list of encoding modes
+ can't be an enum since it needs to take into account the near and
+ same modes, which can vary in number. */
+#define SVNDIFF_SELF 0
+#define SVNDIFF_HERE 1
+/* Near modes are SVNDIFF_HERE .. SVNDIFF1_NEAR_SIZE + 1.
+ Same modes are SVNDIFF1_NEAR_SIZE + 2 .. SVNDIFF1_NEAR_SIZE +
+ SVNDIFF1_SAME_SIZE +1.
+ Currently, we exploit the fact that we use cache sizes of 4 and 2,
+ to encode the mode into 3 bits. */
+
+/* We need 3 bits for the mode right now. */
+#define SVNDIFF_MODE_BITS 3
+#define SVNDIFF_MODE_MASK (1 << 2 | 1 << 1 | 1 << 0)
+
+/* Clear out the current data in the cache, resetting it to the
+ initial state. */
+static void
+svndiff_cache_clear (struct svndiff_cache *cache)
+{
+ int i;
+ for (i = 0; i < SVNDIFF1_NEAR_SIZE; i++)
+ cache->near[i] = 0;
+ for (i = 0; i < SVNDIFF1_SAME_SIZE * 256; i++)
+ cache->same[i] = 0;
+ cache->near_index = 0;
+ cache->pos = 0;
+}
+
+/* Update the cache with a new address. */
+static void svndiff_cache_update (struct svndiff_cache *cache, apr_off_t address)
+{
+ /* Put the address into the next near slot. When we run out of near
+ slots, we start back at the first one. */
+ if (SVNDIFF1_NEAR_SIZE > 0)
+ {
+ int i;
+ int foundit = 0;
+ /* To increase hit percentage, don't put duplicate addresses in
+ the cache. */
+ for (i = 0; i < SVNDIFF1_NEAR_SIZE; i++)
+ if (cache->near[i] == address)
+ foundit = 1;
+ if (!foundit)
+ {
+ cache->near[cache->near_index] = address;
+ cache->near_index = (cache->near_index + 1) % SVNDIFF1_NEAR_SIZE;
+ }
+ }
+ /* Put the address into the same table at the place corresponding
+ to address % table_slots. */
+ if (SVNDIFF1_SAME_SIZE > 0)
+ cache->same[address % (SVNDIFF1_SAME_SIZE * 256)] = address;
+}
+
/* Encode VAL into the buffer P using the variable-length svndiff
integer format. Return the incremented value of P after the
@@ -85,7 +167,82 @@
return p;
}
+/* Encode an address using the current position, and cache, to find
+ the best encoding we can. Return the mode of the best encoding (IE
+ how we did it), in the mode parameter. Return the new encoded
+ address. */
+static apr_off_t
+svndiff_encode_addr (struct svndiff_cache *cache, apr_off_t address,
+ apr_off_t curpos, int *mode)
+{
+ int i;
+ apr_off_t diff;
+ apr_off_t bestenc;
+ int bestmode;
+
+ /* Short circuit the 1 byte cases. Pointless to try to find a
+ better encoding if it already is < 1 byte. */
+ if ((address << SVNDIFF_MODE_BITS) < 1 << 7)
+ {
+ svndiff_cache_update (cache, address);
+ *mode = SVNDIFF_SELF;
+ return address;
+ }
+
+ /* Start with the assumption that the best encoding is just the
+ address itself. */
+
+ bestenc = address;
+ bestmode = SVNDIFF_SELF;
+
+ /* Try encoding based on the offset from the current
+ position. */
+ diff = curpos - address;
+ if (diff >= 0 && diff < bestenc)
+ {
+ bestenc = diff;
+ bestmode = SVNDIFF_HERE;
+ }
+
+ /* Try encoding using the near table. */
+ for (i = 0; i < SVNDIFF1_NEAR_SIZE; ++i)
+ {
+ diff = address - cache->near[i];
+ if (diff >= 0 && diff < bestenc)
+ {
+ bestenc = diff;
+ bestmode = i+2;
+ }
+ }
+ /* Try encoding using the same table. */
+ if (SVNDIFF1_SAME_SIZE > 0)
+ {
+ diff = address % (SVNDIFF1_SAME_SIZE * 256);
+ if (cache->same[diff] == address && diff < bestenc)
+ {
+ bestenc = diff % 256;
+ bestmode = SVNDIFF1_NEAR_SIZE + 2 + (diff / 256);
+ }
+ }
+
+ /* Update the cache with the address we just processed. */
+ svndiff_cache_update (cache, address);
+
+ *mode = bestmode;
+ return bestenc;
+}
+int compare_apr_off_t (const void *first, const void *second)
+{
+ apr_off_t *foff = (apr_off_t *)first;
+ apr_off_t *soff = (apr_off_t *)second;
+ if (*foff < *soff)
+ return -1;
+ else if (*foff > *soff)
+ return 1;
+ return 0;
+
+}
/* Append an encoded integer to a string. */
static void
@@ -97,24 +254,88 @@
svn_stringbuf_appendbytes (header, buf, p - buf);
}
+#include "range.h"
+static void
+range_encode (svn_stringbuf_t *in, svn_stringbuf_t *out)
+{
+ unsigned int i, l;
+ int sym;
+ unsigned int ftbl[256];
+ rc_model rcm;
+ rc_encoder rc;
+ append_encoded_int (out, in->len, NULL);
+ i = model_init (&rcm, 256, ftbl, NULL, 2, (unsigned int) 1 << 14, ADAPT);
+ start_encode (&rc, out);
+
+ for (l = 0; l < in->len; l++)
+ {
+ sym = in->data[l];
+ encode_symbol (&rc, &rcm, sym);
+ }
+ finish_encode(&rc);
+}
+static const unsigned char *
+decode_int (apr_off_t *val,
+ const unsigned char *p,
+ const unsigned char *end);
+static void
+range_decode (svn_stringbuf_t *in, svn_stringbuf_t *out)
+{
+ unsigned int i, l;
+ char sym;
+ unsigned int ftbl[256];
+ rc_model rcm;
+ rc_decoder rd;
+ apr_off_t len;
+ /* First thing in the string is the original length. */
+ in->data = (unsigned char *)decode_int (&len, in->data, in->data+in->len);
+ i = model_init (&rcm, 256, ftbl, NULL, 2, (unsigned int) 1 << 14, ADAPT);
+ start_decode (&rd, in);
+ for (l = 0; l < len; l++)
+ {
+ sym = decode_symbol (&rd, &rcm);
+ svn_stringbuf_appendbytes (out, &sym, 1);
+ }
+}
+#define RANGE 1
static svn_error_t *
window_handler (svn_txdelta_window_t *window, void *baton)
{
struct encoder_baton *eb = baton;
apr_pool_t *pool = svn_pool_create (eb->pool);
svn_stringbuf_t *instructions = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *i1 = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *i2 = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *i3 = svn_stringbuf_create ("", pool);
svn_stringbuf_t *header = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *addresses = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *add1 = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *add2 = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *add3 = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *newdata = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *nd1 = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *nd2 = svn_stringbuf_create ("", pool);
+ svn_stringbuf_t *nd3 = svn_stringbuf_create ("", pool);
+
char ibuf[128], *ip;
+ char abuf[128], *ap;
const svn_txdelta_op_t *op;
svn_error_t *err;
apr_size_t len;
+ apr_off_t lastoffset = 0;
+ apr_off_t bits=0;
+ /* Version 1 needs a cache. */
+ if (eb->version == 1)
+ eb->cache = apr_pcalloc (pool, sizeof (struct svndiff_cache));
/* Make sure we write the header. */
if (eb->header_done == FALSE)
{
+ char svnver[4] = "SVN\0";
len = 4;
- err = svn_stream_write (eb->output, "SVN\0", &len);
+ svnver[3] = eb->version;
+ err = svn_stream_write (eb->output, svnver, &len);
if (err != SVN_NO_ERROR)
return err;
eb->header_done = TRUE;
@@ -140,12 +361,12 @@
return svn_stream_close (output);
}
-
/* Encode the instructions. */
for (op = window->ops; op < window->ops + window->num_ops; op++)
{
/* Encode the action code and length. */
ip = ibuf;
+ ap = abuf;
switch (op->action_code)
{
case svn_txdelta_source: *ip = (char)0; break;
@@ -157,7 +378,38 @@
else
ip = encode_int (ip + 1, op->length);
if (op->action_code != svn_txdelta_new)
- ip = encode_int (ip, op->offset);
+ {
+ /* For version 0, it's just the offset.
+ Version 1 tries to encode the address better using the
+ cache and current position. */
+ if (eb->version == 0)
+ ip = encode_int (ip, op->offset);
+ else if (eb->version == 1)
+ {
+ int mode;
+ apr_off_t offset;
+ offset = svndiff_encode_addr (eb->cache, op->offset,
+ /* window->sview_offset + window->sview_len*/lastoffset, &mode);
+ /* If the new encoding, offset, is greater than bits(apr_off_t) -
+ SVNDIFF_MODE_BITS, we return an error.
+
+ Why not handle it as a special case?
+ Because I don't think it's actually possible, given
+ that we can encode based on current position, which is
+ set to window->sview_offset + window->sview_len.
+
+ This means you'd need a window > ~46 bits long to
+ even approach this error. */
+ if (((offset << SVNDIFF_MODE_BITS) >> SVNDIFF_MODE_BITS) != offset)
+ return svn_error_create (SVN_ERR_SVNDIFF_INVALID_OPS, 0, NULL,
+ pool,
+ "offset too large, would cause error decoding");
+ ap = encode_int (ap, ((offset << SVNDIFF_MODE_BITS) | mode));
+ lastoffset = op->offset;
+
+ svn_stringbuf_appendbytes (addresses, abuf, ap - abuf);
+ }
+ }
svn_stringbuf_appendbytes (instructions, ibuf, ip - ibuf);
}
@@ -165,9 +417,31 @@
append_encoded_int (header, window->sview_offset, pool);
append_encoded_int (header, window->sview_len, pool);
append_encoded_int (header, window->tview_len, pool);
+ if (eb->version == 1)
+ {
+#ifdef RANGE
+ range_encode (instructions, i1);
+ instructions = i1;
+#endif
+ }
append_encoded_int (header, instructions->len, pool);
+ if (eb->version == 1)
+ {
+#ifdef RANGE
+ range_encode (svn_stringbuf_create_from_string (window->new_data, pool), newdata);
+ window->new_data = svn_string_create_from_buf (newdata,pool);
+#endif
+ }
append_encoded_int (header, window->new_data->len, pool);
-
+ if (eb->version == 1)
+ {
+#ifdef RANGE
+ range_encode (addresses, add2);
+ addresses = add2;
+#endif
+ }
+ if (eb->version == 1)
+ append_encoded_int (header, addresses->len, pool);
/* Write out the window. */
len = header->len;
err = svn_stream_write (eb->output, header->data, &len);
@@ -181,6 +455,12 @@
len = window->new_data->len;
err = svn_stream_write (eb->output, window->new_data->data, &len);
}
+ if (eb->version == 1 && err == SVN_NO_ERROR && addresses->len > 0)
+ {
+ len = addresses->len;
+ err = svn_stream_write (eb->output, addresses->data, &len);
+ }
+
svn_pool_destroy (pool);
return err;
@@ -190,7 +470,7 @@
svn_txdelta_to_svndiff (svn_stream_t *output,
apr_pool_t *pool,
svn_txdelta_window_handler_t *handler,
- void **handler_baton)
+ void **handler_baton, unsigned char version)
{
apr_pool_t *subpool = svn_pool_create (pool);
struct encoder_baton *eb;
@@ -199,7 +479,8 @@
eb->output = output;
eb->header_done = FALSE;
eb->pool = subpool;
-
+ eb->version = version;
+ eb->cache = NULL;
*handler = window_handler;
*handler_baton = eb;
}
@@ -240,6 +521,11 @@
not transmit the whole svndiff data stream, you will want this to
be FALSE. */
svn_boolean_t error_on_early_close;
+
+ /* svndiff version in use by delta. */
+ unsigned char version;
+ /* svndiff cache */
+ struct svndiff_cache *cache;
};
@@ -264,6 +550,48 @@
return NULL;
}
+/* Decode the address for svndiff version 1, where it can be encoded
+ in different ways using the cache/current position. Return the
+ buffer positioned just after the address. Set val to the address
+ value. */
+static const unsigned char *
+svndiff_decode_addr (struct svndiff_cache *cache, apr_off_t *val,
+ const unsigned char *buffer,
+ const unsigned char *end)
+{
+ apr_off_t address;
+ int m;
+ apr_off_t currpos = cache->pos;
+ int mode;
+ /* Decode the address and mode. */
+ buffer = decode_int (&address, buffer, end);
+ mode = (address & SVNDIFF_MODE_MASK);
+ address >>= SVNDIFF_MODE_BITS;
+
+ /* For the self mode, we don't need to do anything further. The
+ address is encoded as itself.
+ For here mode, we subtract the address from the current
+ position.
+ For near modes, address is the (value of slot mode in the table
+ + address).
+ For same modes, address is the value of slot (mode * 256 +
+ address) in the table. */
+ if (mode == SVNDIFF_SELF) ;
+ else if (mode == SVNDIFF_HERE)
+ address = currpos - address;
+ else if ((m = mode - (SVNDIFF_HERE + 1)) >= 0 && m < SVNDIFF1_NEAR_SIZE)
+ address = cache->near[m] + address;
+ else if ((m = mode - (SVNDIFF_HERE + 1 + SVNDIFF1_NEAR_SIZE)) >= 0
+ && m < SVNDIFF1_SAME_SIZE)
+ address = cache->same[address + m * 256 ];
+ else
+ return NULL;
+
+ /* Update the cache with the address we just decoded. */
+ svndiff_cache_update (cache, address);
+ *val = address;
+ return buffer;
+}
/* Decode an instruction into OP, returning a pointer to the text
after the instruction. Note that if the action code is
@@ -272,7 +600,10 @@
static const unsigned char *
decode_instruction (svn_txdelta_op_t *op,
const unsigned char *p,
- const unsigned char *end)
+ const unsigned char *end,
+ const unsigned char **addrp,
+ const unsigned char **addrend,
+ struct svndiff_cache *cache)
{
apr_off_t val;
@@ -299,15 +630,31 @@
}
if (op->action_code != svn_txdelta_new)
{
- p = decode_int (&val, p, end);
- if (p == NULL)
- return NULL;
- op->offset = val;
+ /* If we have a cache, we must be version 1+, and need to decode
+ the address. Would be nice if we had the baton or something,
+ so we could be sure what version it was.
+ If we have no cache, it's gotta be version 0. */
+ if (cache != NULL)
+ {
+ *addrp = svndiff_decode_addr (cache, &val, *addrp, *addrend);
+ if (*addrp == NULL)
+ return NULL;
+ op->offset = val;
+ }
+ else
+ {
+ p = decode_int (&val, p, end);
+ if (p == NULL)
+ return NULL;
+ op->offset = val;
+ }
}
return p;
}
+
+
/* Count the instructions in the range [P..END-1] and make sure they
are valid for the given window lengths. Return -1 if the
instructions are invalid; otherwise return the number of
@@ -315,9 +662,12 @@
static int
count_and_verify_instructions (const unsigned char *p,
const unsigned char *end,
+ const unsigned char **addrp,
+ const unsigned char **addrend,
apr_size_t sview_len,
apr_size_t tview_len,
- apr_size_t new_len)
+ apr_size_t new_len,
+ struct svndiff_cache *cache)
{
int n = 0;
svn_txdelta_op_t op;
@@ -325,7 +675,7 @@
while (p < end)
{
- p = decode_instruction (&op, p, end);
+ p = decode_instruction (&op, p, end, addrp, addrend, cache);
if (p == NULL || op.length < 0 || op.length > tview_len - tpos)
return -1;
switch (op.action_code)
@@ -333,10 +683,14 @@
case svn_txdelta_source:
if (op.offset < 0 || op.length > sview_len - op.offset)
return -1;
+ if (cache)
+ cache->pos = op.offset;
break;
case svn_txdelta_target:
if (op.offset < 0 || op.offset >= tpos)
return -1;
+ if (cache)
+ cache->pos = op.offset;
break;
case svn_txdelta_new:
if (op.length > new_len - npos)
@@ -361,22 +715,34 @@
{
struct decode_baton *db = (struct decode_baton *) baton;
const unsigned char *p, *end;
+ const unsigned char *addrp = NULL, *addrend = NULL;
+ const unsigned char *savedp;
apr_off_t val, sview_offset;
- apr_size_t sview_len, tview_len, inslen, newlen, remaining, npos;
+ apr_size_t sview_len, tview_len, inslen, addrlen, newlen, remaining, npos;
+ apr_size_t saved_inslen, saved_addrlen, saved_newlen;
svn_error_t *err;
svn_txdelta_op_t *op;
+ svn_stringbuf_t *instin, *instout;
+ svn_stringbuf_t *addrin, *addrout;
+ svn_stringbuf_t *ndin, *ndout;
int ninst;
/* Chew up four bytes at the beginning for the header. */
if (db->header_bytes < 4)
{
+ char compto[4] = "SVN\0";
apr_size_t nheader = 4 - db->header_bytes;
if (nheader > *len)
nheader = *len;
- if (memcmp (buffer, "SVN\0" + db->header_bytes, nheader) != 0)
+ /* It doesn't matter what the last digit is, but we do need to record it.
+ IOW, this whole thing is pointless if nheader == 1. */
+ compto[3] = buffer[nheader-1];
+ if (memcmp (buffer, compto + db->header_bytes, nheader) != 0)
return svn_error_create (SVN_ERR_SVNDIFF_INVALID_HEADER,
0, NULL, db->pool,
"svndiff has invalid header");
+ db->version = compto[3];
+
*len -= nheader;
buffer += nheader;
db->header_bytes += nheader;
@@ -427,18 +793,32 @@
if (p == NULL)
return SVN_NO_ERROR;
inslen = val;
-
p = decode_int (&val, p, end);
if (p == NULL)
return SVN_NO_ERROR;
newlen = val;
+ if (db->version == 1)
+ {
+ p = decode_int (&val, p, end);
+ if (p == NULL)
+ return SVN_NO_ERROR;
+ addrlen = val;
+ }
+ else
+ {
+ addrlen = 0;
+ }
+ savedp = p;
+ saved_addrlen = addrlen;
+ saved_inslen = inslen;
+ saved_newlen = newlen;
/* Check for integer overflow (don't want to let the input trick
us into invalid pointer games using negative numbers). */
/* FIXME: Some of these are apr_size_t, which is
unsigned. Should they be apr_ptrdiff_t instead? --xbc */
if (sview_offset < 0 || sview_len < 0 || tview_len < 0 || inslen < 0
- || newlen < 0 || inslen + newlen < 0 || sview_offset + sview_len < 0)
+ || addrlen < 0 || newlen < 0 || inslen + addrlen + newlen < 0 || sview_offset + sview_len < 0)
return svn_error_create (SVN_ERR_SVNDIFF_CORRUPT_WINDOW, 0, NULL,
db->pool,
"svndiff contains corrupt window header");
@@ -453,13 +833,49 @@
/* Wait for more data if we don't have enough bytes for the
whole window. */
- if ((apr_size_t) (end - p) < inslen + newlen)
+ if ((apr_size_t) (end - p) < inslen + addrlen + newlen)
return SVN_NO_ERROR;
/* Count the instructions and make sure they are all valid. */
end = p + inslen;
- ninst = count_and_verify_instructions (p, end, sview_len,
- tview_len, newlen);
+ /* For version 1, initialize a new cache with a same table size
+ of 2, and a near table size of 4, and set the current
+ position. Also init the address table pointer */
+ if (db->version == 1)
+ {
+ db->cache = apr_pcalloc (db->subpool, sizeof (struct svndiff_cache));
+ db->cache->pos = 0;
+ instin = svn_stringbuf_ncreate (p, end - p, db->subpool);
+ instout = svn_stringbuf_create ("", db->subpool);
+ range_decode (instin, instout);
+ addrin = svn_stringbuf_ncreate (end + newlen, addrlen, db->subpool);
+ addrout = svn_stringbuf_create ("", db->subpool);
+ range_decode (addrin, addrout);
+ ndin = svn_stringbuf_ncreate (end, newlen, db->subpool);
+ ndout = svn_stringbuf_create ("", db->subpool);
+ range_decode (ndin, ndout);
+ newlen = ndout->len;
+ addrp = addrout->data;
+ addrlen = addrout->len;
+ addrend = addrp + addrlen;
+ p = instout->data;
+ end = instout->data + instout->len;
+
+ }
+ ninst = count_and_verify_instructions (p, end, &addrp, &addrend, sview_len,
+ tview_len, newlen,
+ db->cache);
+
+ /* Reset the cache if we have one, so we can actually do the
+ decoding. Also reset the address table pointer, since it was
+ just overwritten. */
+ if (db->cache)
+ {
+ svndiff_cache_clear (db->cache);
+ addrp = addrout->data;
+ addrend = addrp + addrlen;
+ }
+
if (ninst == -1)
return svn_error_create (SVN_ERR_SVNDIFF_INVALID_OPS, 0, NULL,
db->pool,
@@ -472,19 +888,28 @@
ops = apr_palloc (db->subpool, ninst * sizeof (*ops));
npos = 0;
+
+ if (db->cache)
+ db->cache->pos = 0;
for (op = ops; op < ops + ninst; op++)
{
- p = decode_instruction (op, p, end);
+ p = decode_instruction (op, p, end, &addrp, &addrend, db->cache);
if (op->action_code == svn_txdelta_new)
{
op->offset = npos;
npos += op->length;
}
+ else if (db->version == 1)
+ {
+ db->cache->pos = op->offset;
+ }
}
window.num_ops = ninst;
window.ops = ops;
-
- new_data.data = (const char *)p;
+ if (db->version == 1)
+ new_data.data = (const char *)ndout->data;
+ else
+ new_data.data = (const char *)p;
new_data.len = newlen;
window.new_data = &new_data;
@@ -494,7 +919,16 @@
/* Make a new subpool and buffer, saving aside the remaining
data in the old buffer. */
newpool = svn_pool_create (db->pool);
- p += newlen;
+ /* Restored saved p pointer, and add in real lengths. This is
+ because if instructions/addresses/new data was compressed, the
+ current values of these parameters will *not* match the saved
+ values (since it was expanded), and p will point to a place
+ after the decompressed instructions, rather than to the place
+ after the compressed instructions. */
+ p = savedp;
+ p += saved_inslen;
+ p += saved_newlen;
+ p += saved_addrlen;
remaining = db->buffer->data + db->buffer->len - (const char *) p;
db->buffer =
svn_stringbuf_ncreate ((const char *) p, remaining, newpool);
@@ -554,6 +988,8 @@
db->last_sview_offset = 0;
db->last_sview_len = 0;
db->header_bytes = 0;
+ db->version = 0;
+ db->cache = NULL;
db->error_on_early_close = error_on_early_close;
stream = svn_stream_create (db, pool);
svn_stream_set_write (stream, write_handler);
Index: ./subversion/libsvn_delta/xml_output.c
===================================================================
--- ./subversion/libsvn_delta/xml_output.c
+++ ./subversion/libsvn_delta/xml_output.c Fri Feb 22 20:16:06 2002
@@ -596,7 +596,7 @@
#else
encoder = svn_base64_encode (output, eb->pool);
#endif
- svn_txdelta_to_svndiff (encoder, eb->pool, handler, handler_baton);
+ svn_txdelta_to_svndiff (encoder, eb->pool, handler, handler_baton, 1);
return err;
}
Index: ./subversion/libsvn_delta/range.h
===================================================================
--- ./subversion/libsvn_delta/range.h
+++ ./subversion/libsvn_delta/range.h Sun Feb 24 15:49:13 2002
@@ -0,0 +1,63 @@
+/* Mikael's Version 30 October 2001 */
+
+#ifndef RANGE_HEADER
+#define RANGE_HEADER
+
+#include <limits.h>
+#include "svn_string.h"
+/* Static or adaptive model? */
+#define STATIC 0U
+#define ADAPT 1U
+
+/* Error codes */
+#define RC_OK 0U
+#define RC_ERROR 1U
+#define RC_IO_ERROR 2U
+
+#define TOP ((unsigned int)1 << 24)
+#define BOT ((unsigned int)1 << 16)
+
+typedef struct {
+ unsigned int low;
+ unsigned int range;
+ unsigned int passed;
+ unsigned int error;
+ svn_stringbuf_t *out;
+} rc_encoder;
+
+typedef struct {
+ unsigned int low;
+ unsigned int code;
+ unsigned int range;
+ unsigned int passed;
+ unsigned int error;
+ svn_stringbuf_t *in;
+ unsigned int place;
+} rc_decoder;
+
+typedef struct {
+ unsigned int *freq;
+ unsigned int tot_freq;
+ unsigned int incr;
+ unsigned int max_freq;
+ unsigned int adapt;
+ unsigned int half_freq;
+ unsigned int first_freq;
+ unsigned int last_freq;
+ unsigned int last_sym;
+ unsigned int last_cum_freq;
+ unsigned int nsym;
+ unsigned int nsym2;
+ unsigned int nsym3;
+ unsigned int nsym4;
+ unsigned int nsym23;
+} rc_model;
+
+unsigned int model_init (rc_model *, unsigned int, unsigned int *, unsigned int *, unsigned int, unsigned int, unsigned char);
+void start_encode (rc_encoder *, svn_stringbuf_t *);
+void finish_encode (rc_encoder *);
+void start_decode (rc_decoder *, svn_stringbuf_t *);
+signed int encode_symbol (rc_encoder *, rc_model *, unsigned int);
+signed int decode_symbol (rc_decoder *, rc_model *);
+
+#endif
Index: ./subversion/libsvn_delta/range.c
===================================================================
--- ./subversion/libsvn_delta/range.c
+++ ./subversion/libsvn_delta/range.c Mon Feb 25 10:23:18 2002
@@ -0,0 +1,273 @@
+/* Carryless rangecoder (c) 1999 by Dmitry Subbotin */
+/* Modified into C and extended 2001 by Mikael Lundqvist */
+/* Modified to fit Subversion needs by Daniel Berlin */
+/* Mikael's Version 30 October 2001 */
+
+#include <stdio.h>
+#include <assert.h>
+#include "range.h"
+
+/* The two IO functions */
+static inline void
+out_byte (rc_encoder *rc, unsigned char c)
+{
+ svn_stringbuf_appendbytes (rc->out, &c, 1);
+ rc->passed++;
+}
+
+static inline unsigned char
+in_byte (rc_decoder *rd)
+{
+ int sym;
+ if (rd->place < rd->in->len)
+ {
+ rd->passed++;
+ return rd->in->data[rd->place++];
+ }
+ return 0;
+}
+
+/* Rangecoder start and end functions */
+void
+start_encode (rc_encoder *rc, svn_stringbuf_t *outstr)
+{
+ rc->passed = rc->low = 0;
+ rc->range = (unsigned int) -1;
+ rc->out = outstr;
+ rc->error = RC_OK;
+}
+
+void
+finish_encode (rc_encoder *rc)
+{
+ out_byte(rc, rc->low>>24), rc->low<<=8;
+ out_byte(rc, rc->low>>24), rc->low<<=8;
+ out_byte(rc, rc->low>>24), rc->low<<=8;
+ out_byte(rc, rc->low>>24), rc->low<<=8;
+}
+
+void
+start_decode (rc_decoder *rd, svn_stringbuf_t *instr)
+{
+ rd->passed = rd->low = rd->code = 0;
+ rd->range = (unsigned int) -1;
+ rd->in = instr;
+ rd->error = RC_OK;
+ rd->place = 0;
+ rd->code = rd->code<<8 | in_byte(rd);
+ rd->code = rd->code<<8 | in_byte(rd);
+ rd->code = rd->code<<8 | in_byte(rd);
+ rd->code = rd->code<<8 | in_byte(rd);
+}
+
+/* Rangecoder frequency encoding and decoding */
+static inline void
+encode (rc_encoder *rc, rc_model *rcm, unsigned int cum_freq, unsigned int sym)
+{
+ rc->low += cum_freq * (rc->range /= rcm->tot_freq);
+ rc->range *= rcm->freq[sym];
+ while ( (rc->low ^ (rc->low+rc->range)) < TOP
+ || (rc->range < BOT && ((rc->range = -rc->low & (BOT-1)), 1)) )
+ {
+ out_byte(rc, rc->low>>24);
+ rc->range <<= 8;
+ rc->low <<= 8;
+ }
+}
+
+static inline void
+decode (rc_decoder *rd, rc_model *rdm, unsigned int cum_freq, unsigned int sym)
+{
+ rd->low += cum_freq * rd->range;
+ rd->range *= rdm->freq[sym];
+ while ( (rd->low ^ (rd->low+rd->range)) < TOP
+ || (rd->range < BOT && ((rd->range = -rd->low & (BOT-1)),1)) )
+ {
+ rd->code = rd->code<<8 | in_byte(rd);
+ rd->range <<= 8;
+ rd->low <<= 8;
+ }
+}
+
+/* _symbol encode and decode */
+signed int
+encode_symbol(rc_encoder *rc, rc_model *rcm, unsigned int sym)
+{
+ unsigned int i, cum_freq;
+
+ if (sym >= rcm->nsym)
+ {
+ rc->error = RC_ERROR;
+ return -1;
+ }
+
+ /* Calculate cum_freq choosing the shortest path possible */
+ if (sym >= rcm->last_sym)
+ if (rcm->nsym - sym > sym - rcm->last_sym)
+ {
+ for (i = rcm->last_sym, cum_freq = rcm->last_cum_freq; i < sym; cum_freq += rcm->freq[i++]) ;
+ rcm->last_cum_freq = cum_freq;
+ rcm->last_sym = sym;
+ }
+ else
+ for (i = sym, cum_freq = rcm->tot_freq; i < rcm->nsym; cum_freq -= rcm->freq[i++]) ;
+ else
+ if (rcm->last_sym >= sym + sym)
+ {
+ for (cum_freq = i = 0; i < sym; cum_freq += rcm->freq[i++]) ;
+ if (rcm->adapt) rcm->last_cum_freq += rcm->incr;
+ }
+ else
+ {
+ for (i = sym, cum_freq = rcm->last_cum_freq; i < rcm->last_sym; cum_freq -= rcm->freq[i++]) ;
+ rcm->last_cum_freq = cum_freq;
+ rcm->last_sym = sym;
+ }
+
+ encode(rc, rcm, cum_freq, sym);
+
+ /* Update statistics */
+ if (rcm->adapt)
+ {
+ rcm->freq[sym] += rcm->incr;
+ rcm->tot_freq += rcm->incr;
+ if (rcm->tot_freq > rcm->max_freq)
+ {
+ for (i = rcm->tot_freq = 0; i < rcm->nsym; i++)
+ rcm->tot_freq += (rcm->freq[i] -= (rcm->freq[i] >> 1));
+ rcm->last_cum_freq = 0;
+ rcm->last_sym = 0;
+ }
+ }
+
+ return 0;
+}
+
+signed int
+decode_symbol(rc_decoder *rd, rc_model *rdm)
+{
+ unsigned int i, sym, cum_freq, rfreq;
+
+ rfreq = (rd->code - rd->low) / (rd->range /= rdm->tot_freq); /* Get_freq */
+ if (rfreq >= rdm->tot_freq)
+ {
+ rd->error = RC_ERROR;
+ return -1;
+ }
+
+ /* Find symbol choosing the shortest possible path */
+ if (rdm->last_cum_freq > rfreq)
+ if (rfreq > rdm->half_freq || (rdm->last_sym <= rdm->nsym23 && rfreq >= rdm->first_freq))
+ {
+ for (sym = rdm->last_sym-1, cum_freq = rdm->last_cum_freq-rdm->freq[sym];;)
+ {
+ if (cum_freq <= rfreq) break;
+ cum_freq -= rdm->freq[--sym];
+ }
+ rdm->last_cum_freq = cum_freq;
+ rdm->last_sym = sym;
+ }
+ else
+ { /* if rfreq < half_freq && (last_sym > nsym23 || rfreq < first_freq) */
+ for (sym = 0, cum_freq = rdm->freq[0];;)
+ {
+ if (cum_freq > rfreq) break;
+ cum_freq += rdm->freq[++sym];
+ }
+ cum_freq -= rdm->freq[sym];
+ if (rdm->adapt) rdm->last_cum_freq += rdm->incr;
+ }
+ else /* last_cum_freq <= rfreq */
+ if (rfreq < rdm->half_freq || (rdm->last_sym >= rdm->nsym3 && rfreq <= rdm->last_freq))
+ {
+ for (sym = rdm->last_sym, cum_freq = rdm->last_cum_freq + rdm->freq[sym];;)
+ {
+ if (cum_freq > rfreq) break;
+ cum_freq += rdm->freq[++sym];
+ }
+ cum_freq -= rdm->freq[sym];
+ rdm->last_cum_freq = cum_freq;
+ rdm->last_sym = sym;
+ }
+ else /* if rfreq > half_freq && (rdm->last_sym < rdm->nsym3 || rfreq > last_freq) */
+ for (sym = rdm->nsym-1, cum_freq = rdm->tot_freq-rdm->freq[sym];;)
+ {
+ if (cum_freq <= rfreq) break;
+ cum_freq -= rdm->freq[--sym];
+ }
+
+ decode(rd, rdm, cum_freq, sym);
+
+ /* Update statistics */
+ if (rdm->adapt)
+ {
+ rdm->freq[sym] += rdm->incr;
+ if (sym < rdm->nsym23)
+ {
+ rdm->last_freq += rdm->incr;
+ if (sym < rdm->nsym2)
+ {
+ rdm->half_freq += rdm->incr;
+ if (sym < rdm->nsym4)
+ rdm->first_freq += rdm->incr;
+ }
+ }
+ rdm->tot_freq += rdm->incr;
+ if (rdm->tot_freq > rdm->max_freq)
+ {
+ for (i = rdm->tot_freq = 0; i < rdm->nsym; i++)
+ rdm->tot_freq += (rdm->freq[i] -= (rdm->freq[i] >> 1));
+ rdm->half_freq /= 2;
+ rdm->first_freq /= 2;
+ rdm->last_freq /= 2;
+ rdm->last_cum_freq = 0;
+ rdm->last_sym = 0;
+ }
+ }
+
+ return sym;
+}
+
+/* Initializing the model */
+unsigned int
+model_init(rc_model *rcm, unsigned int nsym, unsigned int *ftbl, unsigned int *ifreq,
+ unsigned int incr, unsigned int max_freq, unsigned char adapt)
+{
+ unsigned int i;
+
+ rcm->nsym = nsym; /* Nr of symbols */
+ /* nsym2 - nsym23 are just constants designed to speed things up */
+ rcm->nsym2 = nsym/2;
+ rcm->nsym3 = nsym/3;
+ rcm->nsym4 = nsym*5/22; /* For first_freq */
+ rcm->nsym23 = nsym*2/3; /* For last_freq */
+
+ rcm->freq = ftbl;
+ rcm->tot_freq = 0;
+ if (ifreq != NULL)
+ for (i = 0; i < nsym; i++)
+ rcm->tot_freq += (ftbl[i] = ifreq[i]);
+ else
+ for (i = 0; i < nsym; i++)
+ rcm->tot_freq += (ftbl[i] = 1);
+
+ /* Sums to help decode_symbol make better decisions */
+ for (rcm->first_freq = i = 0; i < rcm->nsym4; i++)
+ rcm->first_freq += rcm->freq[i];
+ for (rcm->half_freq = rcm->first_freq, i = rcm->nsym4; i < rcm->nsym2; i++)
+ rcm->half_freq += rcm->freq[i];
+ for (rcm->last_freq = rcm->half_freq, i = rcm->nsym2; i < rcm->nsym23; i++)
+ rcm->last_freq += rcm->freq[i];
+
+ rcm->incr = incr;
+ rcm->max_freq = max_freq;
+ rcm->adapt = adapt;
+ rcm->last_cum_freq = 0;
+ rcm->last_sym = 0;
+
+ /* Total frequency MUST be kept below 1 << 16 (= 65536) */
+ if (rcm->max_freq > BOT || rcm->tot_freq > BOT)
+ return RC_ERROR;
+
+ return RC_OK;
+}
Index: ./subversion/libsvn_ra_dav/commit.c
===================================================================
--- ./subversion/libsvn_ra_dav/commit.c
+++ ./subversion/libsvn_ra_dav/commit.c Fri Feb 22 20:16:07 2002
@@ -1026,7 +1026,7 @@
svn_stream_set_write(stream, commit_stream_write);
svn_stream_set_close(stream, commit_stream_close);
- svn_txdelta_to_svndiff(stream, subpool, handler, handler_baton);
+ svn_txdelta_to_svndiff(stream, subpool, handler, handler_baton, 0);
return NULL;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:09 2006