I started out with a perl bindings bug and found a nasty bug in the our
interface. John Peacock had reported trouble running the test suite for
my pending perl bindings patch. It would segfault on him.
We tracked it down to the blame receiver callback. The C thunk seemed
to be getting its parameters wrong. Yet I checked and the parameters
were correct coming from the client lib.
What I eventually discovered was a size mismatch for the apr_off_t type
between the bindings and the client lib. The apr_off_t type is used for
the line_no parameter to the blame receiver.
apr_off_t is just a typedef to off_t. Which on Linux is defined in
# ifndef __USE_FILE_OFFSET64
typedef __off_t off_t;
typedef __off64_t off_t;
The problem is pretty clear. Some compliations might have be a 64-bit
int and some might be a 32-bit int. In my case perl bindings are
picking up the 64-bit off_t form perl via the SWIG_PL_INCLUDES variable.
libsvn_client sees it as 32-bit, the bindings see it as 64-bit. This is
Looking at the headers we're using apr_off_t in svn_diff, svn_io and
svn_client (for the blame receiver). By only have a 32-bit apr_off_t
we're crippling ourselves in the long run for large file support and
making interoperating with the growing number of libraries that do have
Unfortunately, APR doesn't turn it on. Nor does apr provide us with an
apr_off64_t. I would guess it doesn't try to turn it on because it
can't be reliably done on all platforms.
We can't just ignore the issue because we have an ambiguous data type in
our interface. Even if APR adds support apr_off_t will still probably
be ambiguous. Only a new type of apr_off32_t and apr_off64_t would be
unambiguous. So if/when APR fixes this we'd be left with an API change.
We can turn on the 64-bit off_t in our compilation but then we won't
necessarily match the APR lib we build against. This shouldn't be a
huge problem, unless we run into a large file. Then APR wouldn't be
able to handle it and we might get bizzare behavior, which I don't think
So I guess my suggestion at this point is to do the following:
a) For exported interfaces do not use apr_off_t. Use apr_int64_t.
b) At build time check to see if APR is using a 32-bit off_t or a 64-bit
off_t (it could be 64-bit on some platforms). We can check this by
looking at the APR_OFF_T_FMT define. If it's 32-bit enable bounds
checking so we can throw an error if the value would surpass the bounds
of what APR can handle. I don't think there'd be very many places we'd
have to check for this. Only right before we passed one of these vars
to an APR function. (Which as far as I can tell is only apr_file_seek
c) Don't use apr_off_t for the line_no parameter of the blame receiver.
I figure either unsigned long or apr_uint32_t. The former gets the
advantage of scaling with 64-bit archs. Not that I think anyone is ever
going to want to run blame on a file with more than 4 billion lines.
Since we're mostly only using apr_off_t as an output variable as opposed
to an input variable this ought to be pretty safe. The one exception is
svn_io_file_seek. Which would require the bounds checking to match APR.
However, we might be able to ignore the bounds checking and just drop
apr_io_file_seek, I don't see it being used anywhere.
If nobody has any better ideas, I'll start coding up a patch to
implement this. I strongly think if we're going to do this we should do
it before 1.0.
Ben Reser <firstname.lastname@example.org>
"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Mon Jan 12 11:11:02 2004