On Mon, May 12, 2008 at 11:29 AM, Daniel Shahaf <d.s_at_daniel.shahaf.co.il> wrote:
> Troy Curtis Jr wrote on Mon, 12 May 2008 at 06:49 -0500:
>> On Mon, May 12, 2008 at 5:52 AM, Stefan Sperling <stsp_at_elego.de> wrote:
>> >> + if (utf8_target[j] == '@')
>> >> + {
>> >> + peg_start = utf8_target + j;
>> >
>> > Stylistic nit:
>> >
>> > With casting added, this becomes:
>> >
>> > peg_start = utf8_target + (const char *)j;
>> >
>> > This made me scratch my head at first, I hadn't seen pointers
>> > and ints being mixed like that in a while.
>> >
>> > I bet there's tons of code out there that does this, but
>>
>> In fact I consolidated code that did this in three different places,
>> into one. I tend not to use pointer arithmetic in my own newly
>> generated code because it does seem kind of confusing to me too!
>>
>> > I myself would tend to use an apr_size_t variable initialised
>> > to strlen(utf8_target), and decrement that in the loop instead of
>> > adding pointers and ints. That is, I'd try not to rely on automatic
>> > type conversion too much. See below for an example.
>> >
>
> I'd just do s/utf8_target + j/&utf8_target[j]/ and move on :)
>
Where's the fun in that? Too easy, something must be wrong with it!
>> > Note also how the exit condition of the loop ("j >= 0") also depends on
>> > j being a signed type (int), even though we're actually trying to
>> > determine an apr_size_t (which is unsigned).
>> >
>>
>> Though admittedly my reasons for not liking pointer arithmetic aren't
>> so methodical. It just makes me think too hard :).
>>
>> >> + break;
>> >> + }
>> >> + }
>> >> +
>> >> + if (peg_start)
>> >> + {
>> >> + *true_target = apr_pstrmemdup(pool,
>> >> + utf8_target,
>> >> + peg_start - utf8_target);
>> >
>> > Again, why not do an explicit cast if we're mixing types in
>> > expressions like that? For example:
>> >
>> > *true_target = apr_pstrmemdup(pool, utf8_target,
>> > (apr_size_t)(peg_start - utf8_target));
>> >
>> >
>> > Let me try to sketch out how we could do this without adding
>> > together variables of 3 different types (const char*, int, apr_size_t).
>> >
>> > static svn_error_t *
>> > split_arg_at_peg_revision(const char **true_target,
>> > const char **peg_revision,
>> > const char *utf8_target,
>> > apr_pool_t *pool)
>> > {
>> > const char *peg_start = NULL; /* pointer to the peg revision, if any */
>> > /* XXX: delete this comment before commit
>> > * Here's the only place where we rely on automatic type conversion
>> > * (size_t -> apr_size_t): */
>> > apr_size_t target_len = strlen(utf8_target);
>> > apr_size_t j;
>> >
>> > for (j = target_len; j > 0; --j)
>> > {
>> > /* If we hit a path separator, stop looking. This is OK
>> > * only because our revision specifiers can't contain '/'. */
>> > if (utf8_target[j - 1] == '/')
>> > break;
>> >
>> > /* If we find a peg revision, remember where we found it,
>> > * and calculate the actual length of the target URL. */
>> > if (utf8_target[j - 1] == '@')
>> > {
>> > peg_start = &(utf8_target[j - 1]);
>> >
>> > /* XXX: Delete this comment before commit, I just added
>> > * it to convince myself that the code is correct.
>> > * There is no off-by-one here, since:
>> > * [@|\0] --> target_len = 1 - 1 == 0
>> > * [h|@|\0] --> target_len = 2 - 1 == 1
>> > * [h|t|@|\0] --> target_len = 3 - 1 == 2
>> > * [h|t|t|@|\0] --> target_len = 4 - 1 == 3
>> > * [h|t|t|p|@|\0] --> target_len = 5 - 1 == 4
>> > * ...
>> > */
>
> I think of it this way: The length of a '\0'-terminated string is the
> index of the '\0', therefore the length of a '@'-terminated string is
> the index of the '@'.
>
>> > target_len = j - 1;
>> >
>> > break;
>> > }
>> > }
>>
>> I like it, though i'll probably do j = target_len -1 in the
>> initialization as you must do [j-1] at every usage point, and the
>> [j-1] is needed because here you are translating a length "j" into an
>> index "j-1". I tend to think of "j" as always meaning an index.
>>
>> >
>> > /* XXX: delete this comment before commit
>> > *
>> > * Note how we previously didn't handle target_len == 0 here!
>> > * But we should probably throw an error or something,
>> > * in this case, right? This case didn't even occur to me while
>> > * thinking about the problem in terms of pointer arithmetic,
>> > * but now it's rather obvious. We might also want to extend the
>> > * function's docstring to mention that it will raise an error
>> > * if it cannot split off a non-zero length target from the URL.
>> > * Note that zero-length paths are considered canonical.
>> > *
>> > * Pseudo code example: */
>>
>> I'm still learning about all this 'canonical' stuff, but if a
>> zero-length path is considered canonical then should this function
>> support that? Will apr_pstrmemdup() not output an empty path?
>>
>
> Except the paths passed to this function are not canonical; they are user
> input. If we allow the empty string here, wouldn't it allow
>
> svn up ''
>
> as synonym for
>
> svn up '.'
>
Makes sense to me, especially because I think the '.' is typically
canonicalized to '' anyway, right? (Again don't be too hard on me,
I'm still learning all this canonical stuff).
> ?
>
> Daniel
>
>
Troy
--
"Beware of spyware. If you can, use the Firefox browser." - USA Today
Download now at http://getfirefox.com
Registered Linux User #354814 ( http://counter.li.org/)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-05-13 04:56:56 CEST