Re: [PATCH] ISSUE #3193 Fix peg revision parsing for CLI repository root relative urls.

From: Daniel Shahaf <d.s_at_daniel.shahaf.co.il>
Date: Mon, 12 May 2008 19:29:02 +0300 (Jerusalem Daylight Time)

Troy Curtis Jr wrote on Mon, 12 May 2008 at 06:49 -0500:
> On Mon, May 12, 2008 at 5:52 AM, Stefan Sperling <stsp_at_elego.de> wrote:
> >> + if (utf8_target[j] == '@')
> >> + {
> >> + peg_start = utf8_target + j;
> >
> > Stylistic nit:
> >
> > With casting added, this becomes:
> >
> > peg_start = utf8_target + (const char *)j;
> >
> > This made me scratch my head at first, I hadn't seen pointers
> > and ints being mixed like that in a while.
> >
> > I bet there's tons of code out there that does this, but
>
> In fact I consolidated code that did this in three different places,
> into one. I tend not to use pointer arithmetic in my own newly
> generated code because it does seem kind of confusing to me too!
>
> > I myself would tend to use an apr_size_t variable initialised
> > to strlen(utf8_target), and decrement that in the loop instead of
> > adding pointers and ints. That is, I'd try not to rely on automatic
> > type conversion too much. See below for an example.
> >

I'd just do s/utf8_target + j/&utf8_target[j]/ and move on :)

> > Note also how the exit condition of the loop ("j >= 0") also depends on
> > j being a signed type (int), even though we're actually trying to
> > determine an apr_size_t (which is unsigned).
> >
>
> Though admittedly my reasons for not liking pointer arithmetic aren't
> so methodical. It just makes me think too hard :).
>
> >> + break;
> >> + }
> >> + }
> >> +
> >> + if (peg_start)
> >> + {
> >> + *true_target = apr_pstrmemdup(pool,
> >> + utf8_target,
> >> + peg_start - utf8_target);
> >
> > Again, why not do an explicit cast if we're mixing types in
> > expressions like that? For example:
> >
> > *true_target = apr_pstrmemdup(pool, utf8_target,
> > (apr_size_t)(peg_start - utf8_target));
> >
> >
> > Let me try to sketch out how we could do this without adding
> > together variables of 3 different types (const char*, int, apr_size_t).
> >
> > static svn_error_t *
> > split_arg_at_peg_revision(const char **true_target,
> > const char **peg_revision,
> > const char *utf8_target,
> > apr_pool_t *pool)
> > {
> > const char *peg_start = NULL; /* pointer to the peg revision, if any */
> > /* XXX: delete this comment before commit
> > * Here's the only place where we rely on automatic type conversion
> > * (size_t -> apr_size_t): */
> > apr_size_t target_len = strlen(utf8_target);
> > apr_size_t j;
> >
> > for (j = target_len; j > 0; --j)
> > {
> > /* If we hit a path separator, stop looking. This is OK
> > * only because our revision specifiers can't contain '/'. */
> > if (utf8_target[j - 1] == '/')
> > break;
> >
> > /* If we find a peg revision, remember where we found it,
> > * and calculate the actual length of the target URL. */
> > if (utf8_target[j - 1] == '@')
> > {
> > peg_start = &(utf8_target[j - 1]);
> >
> > /* XXX: Delete this comment before commit, I just added
> > * it to convince myself that the code is correct.
> > * There is no off-by-one here, since:
> > * [@|\0] --> target_len = 1 - 1 == 0
> > * [h|@|\0] --> target_len = 2 - 1 == 1
> > * [h|t|@|\0] --> target_len = 3 - 1 == 2
> > * [h|t|t|@|\0] --> target_len = 4 - 1 == 3
> > * [h|t|t|p|@|\0] --> target_len = 5 - 1 == 4
> > * ...
> > */

I think of it this way: The length of a '\0'-terminated string is the
index of the '\0', therefore the length of a '@'-terminated string is
the index of the '@'.

> > target_len = j - 1;
> >
> > break;
> > }
> > }
>
> I like it, though i'll probably do j = target_len -1 in the
> initialization as you must do [j-1] at every usage point, and the
> [j-1] is needed because here you are translating a length "j" into an
> index "j-1". I tend to think of "j" as always meaning an index.
>
> >
> > /* XXX: delete this comment before commit
> > *
> > * Note how we previously didn't handle target_len == 0 here!
> > * But we should probably throw an error or something,
> > * in this case, right? This case didn't even occur to me while
> > * thinking about the problem in terms of pointer arithmetic,
> > * but now it's rather obvious. We might also want to extend the
> > * function's docstring to mention that it will raise an error
> > * if it cannot split off a non-zero length target from the URL.
> > * Note that zero-length paths are considered canonical.
> > *
> > * Pseudo code example: */
>
> I'm still learning about all this 'canonical' stuff, but if a
> zero-length path is considered canonical then should this function
> support that? Will apr_pstrmemdup() not output an empty path?
>

Except the paths passed to this function are not canonical; they are user
input. If we allow the empty string here, wouldn't it allow

svn up ''

as synonym for

svn up '.'

?

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-05-12 18:29:22 CEST

This message: [ Message body ]
Next message: Daniel Shahaf: "Re: [PATCH] Re: server-side enforcement of svn:date format?"
Previous message: Karl Fogel: "Re: Google Summer of Code project: sparse checkouts deselection interface."
In reply to: Troy Curtis Jr: "Re: [PATCH] ISSUE #3193 Fix peg revision parsing for CLI repository root relative urls."
Next in thread: Troy Curtis Jr: "Re: [PATCH] ISSUE #3193 Fix peg revision parsing for CLI repository root relative urls."
Reply: Troy Curtis Jr: "Re: [PATCH] ISSUE #3193 Fix peg revision parsing for CLI repository root relative urls."

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]