Re: RFC: date parser strawman

From: <kfogel_at_collab.net>
Date: 2003-12-30 21:08:21 CET

mark benedetto king <mbk@lowlatency.com> writes:
> +/** Convert a human-readable date @a value into an @c apr_time_t, using
> + * @a now as the current time and @a gmtoff as an indicator of
> + * seconds east of GMT, storing the result in @a val. Set @a matched
> + * to indicate whether or not @a value was parsed successfully. Perform
> + * any allocation in @a pool.
> */
> -time_t svn_parse_date (char *text, struct getdate_time *now);
> +svn_error_t *
> +svn_parse_date (svn_boolean_t *matched, apr_time_t *val, const char *value,
> + apr_time_t now, apr_int32_t gmtoff, apr_pool_t *pool);

A couple of interface comments:

It's confusing that the input is named "value" and the output "val".
Maybe choose clearer names?

The function returns boolean *and* error -- why both? (I'm sure
there's a good reason, just the doc string should explain the error
conditions and how they relate to 'matched'.)

And a documentation comment: Obviously, we'll need to update the book
and any other place where we document date formats. I understand this
patch is just about implementation, and that you're intentionally
waiting on the other stuff. I'm just noting it here so we don't
neglect it. (Glad you got the INSTALL file, though -- I'd totally
forgotten about that.)

Code comments below. It's extremely clean & well-designed, was really
a pleasure to read, by the way.

> Index: subversion/libsvn_subr/date.c
> ===================================================================
> --- subversion/libsvn_subr/date.c (revision 0)
> +++ subversion/libsvn_subr/date.c (revision 0)
> @@ -0,0 +1,228 @@
> +#include <apr_lib.h>
> +#include <svn_time.h>
> +
> +/* Valid rule actions */
> +enum rule_action {
> + ACCUM, /* Accumulate a decimal value */
> + MILLI, /* Accumulate milliseconds */
> + TZIND, /* Handle +, -, Z */
> + NOOP, /* Do nothing */
> + SKIPFROM, /* If at end-of-value, accept the match. Otherwise,
> + if the next template character matches the current
> + value character, continue processing as normal.
> + Otherwise, attempt to complete matching starting
> + immediately after the first subsequent occurrance of
> + ']' in the template. */
> + SKIP, /* Ignore this template character */
> + ACCEPT /* Accept the value */
> +};
> +
> +/* How to handle a particular character in a template */
> +typedef struct
> +{
> + char key; /* The template char that this rule matches */
> + const char *valid; /* String of valid chars for this rule */
> + enum rule_action action; /* What action to take when the rule is matched */
> + int offset; /* Where to store the any results of the action,
> + relative to the base of an apr_time_exp_t */
> +} rule;
> +
> +
> +/* The parsed values, before localtime/gmt processing */
> +typedef struct
> +{
> + apr_time_exp_t base;
> + apr_int32_t offhours;
> + apr_int32_t offminutes;
> +} match_state;
> +
> +#define DIGITS "0123456789"
> +
> +/* A declarative specification of how each template character
> + should be processed, using a rule for each valid symbol. */
> +static const rule
> +rules[] =
> +{
> + { 'Y', DIGITS, ACCUM, APR_OFFSETOF (match_state, base.tm_year) },
> + { 'M', DIGITS, ACCUM, APR_OFFSETOF (match_state, base.tm_mon) },
> + { 'D', DIGITS, ACCUM, APR_OFFSETOF (match_state, base.tm_mday) },
> + { 'h', DIGITS, ACCUM, APR_OFFSETOF (match_state, base.tm_hour) },
> + { 'm', DIGITS, ACCUM, APR_OFFSETOF (match_state, base.tm_min) },
> + { 's', DIGITS, ACCUM, APR_OFFSETOF (match_state, base.tm_sec) },
> + { 'u', DIGITS, MILLI, APR_OFFSETOF (match_state, base.tm_usec) },
> + { 'O', DIGITS, ACCUM, APR_OFFSETOF (match_state, offhours) },
> + { 'o', DIGITS, ACCUM, APR_OFFSETOF (match_state, offminutes) },
> + { '+', "-+", TZIND, 0 },
> + { 'Z', "Z", TZIND, 0 },
> + { ':', ":", NOOP, 0 },
> + { '-', "-", NOOP, 0 },
> + { 'T', "T", NOOP, 0 },
> + { ' ', " ", NOOP, 0 },

For space, should we match TAB (ascii 9) as well in the second field?

> + { '.', ".,", NOOP, 0 },
> + { '[', NULL, SKIPFROM, 0 },
> + { ']', NULL, SKIP, 0 },
> + { '\0', "", ACCEPT, 0 },
> +};

No "/" permitted, I see. Is it atypical to want that? (More on this
below.)

> +/* Return the rule associated with TCHAR, or NULL if there
> + is no such rule */
> +static const rule *
> +find_rule (char tchar)
> +{
> + int i = sizeof (rules)/sizeof (rules[0]);
> + while (i--)
> + if (rules[i].key == tchar)
> + return &rules[i];
> + return NULL;
> +}
> +
> +/* Attempt to match the date-string in VALUE to the provided TEMPLATE,
> + using the rules defined above. Use GMTOFF is to bias the results
> + when no timezone indicator is present. On successful match, store
> + the matched values in EXP, and return TRUE. Otherwise, return
> + FALSE. */

Second sentence of doc string unclear, extra word somewhere? Also,
store matched values in "*EXP", not "EXP", right? (I know that's
nitpicky, but might as well be consistent with elsewhere.)

> +static svn_boolean_t
> +template_match (apr_time_exp_t *exp, const char *template, const char *value,
> + apr_int32_t gmtoff)
> +{
> + int multiplier = 100000;
> + int tzind = 0;
> + match_state ms;
> + char *base = (char *)&ms;
> +
> + memset (&ms, 0, sizeof (ms));
> +
> + for (;;)
> + {
> + const rule *match = find_rule (*template++);
> + char vchar = *value++;
> + apr_int32_t *place;
> +
> + if (!match || (match->valid && !strchr (match->valid, vchar)))
> + return FALSE;
> +
> + place = (apr_int32_t *)(base + match->offset);

I'm a bit mystified by the casting of &ms to get base, and by the
pointer arithmetic here. Probably I could puzzle it out, but a
comment explaining what's going on might be good. My endianness bells
started ringing when I saw the assignment to place above... (I'm sure
you've thought of that and there's nothing to worry about, just wonder
how many other people's bells will go off.)

> + switch (match->action)
> + {
> + case ACCUM:
> + *place = *place * 10 + vchar - '0';
> + continue;
> + case MILLI:
> + *place += (vchar - '0') * multiplier;
> + multiplier /= 10;
> + continue;
> + case TZIND:
> + tzind = vchar;
> + continue;
> + case SKIP:
> + value--;
> + continue;
> + case NOOP:
> + continue;
> + case SKIPFROM:
> + if (!vchar)
> + break;
> + match = find_rule (*template);
> + if (!strchr (match->valid, vchar))
> + template = strchr (template, ']') + 1;
> + value--;
> + continue;
> + case ACCEPT:
> + break;
> + }

Am I correct in understanding that this permits repeats of NOOP
characters? For example, a template of "YYYY-MM" would allow
"2004----01", and "YYYY MM" would allow "2004 01"? (This is a
good thing, just want to make sure I understand correctly.)

> + break;
> + }
> +
> + switch (tzind)
> + {
> + case 0:
> + ms.base.tm_gmtoff = gmtoff;
> + break;
> + case '+':
> + ms.base.tm_gmtoff = ms.offhours * 3600 + ms.offminutes * 60;
> + break;
> + case '-':
> + ms.base.tm_gmtoff = -(ms.offhours * 3600 + ms.offminutes * 60);
> + break;
> + }
> +
> + *exp = ms.base;
> + return TRUE;
> +}
> +
> +static int
> +valid_days_by_month[] = {
> + 31, 29, 31, 30,
> + 31, 30, 31, 31,
> + 30, 31, 30, 31
> +};

Heh. I saw the "29" and wondered, do we want to automatically parse
for leap year violations, since it's a fixed formula? Then I searched
forward in the patch for "leap" and saw that you do :-).

>
>
> +
> +{
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
+svn_error_t * +svn_parse_date (svn_boolean_t *matched, apr_time_t *val, const char *value, apr_time_t now, apr_int32_t gmtoff, apr_pool_t *pool) apr_time_exp_t exp; apr_status_t apr_err; *matched = FALSE; if (template_match (&exp, /* try ISO-8601 extended, UTC */ "YYYY-MM-DD[Thh[:mm[:ss[.u[u[u[u[u[u][Z]", value, gmtoff) || template_match (&exp, /* try ISO-8601 extended, with offset */ "YYYY-MM-DD[Thh[:mm[:ss[.u[u[u[u[u[u]+oo[:oo]", value, gmtoff) || template_match (&exp, /* try ISO-8601 basic, UTC */ "YYYYMMDD[Thh[mm[ss[.u[u[u[u[u[u][Z]", value, gmtoff) || template_match (&exp, /* try ISO-8601 basic, with offset */ "YYYYMMDD[Thh[mm[ss[.u[u[u[u[u[u]+oo[oo]", value, gmtoff) || template_match (&exp, /* try the "svn log" format */ "YYYY-MM-DD hh:mm:ss[.u[u[u[u[u[u][ +oo[oo]", value, gmtoff))

Are we starting out deliberately very strict, with intent to loosen up
and allow more formats later?

I'm wondering about "YYYY/MM/DD", "YY/MM/DD", for example.

I realize you posted a long mail explaining about formats, and I
didn't bring this up then :-). I just figured it would be easy to
discuss once we had the patch, since it would obviously be a minor
code tweak (and clearly it is).

> + {
> + exp.tm_year -= 1900;
> + exp.tm_mon -= 1;
> + }
> + else if (template_match (&exp, /* try just a time */
> + "hh:mm[:ss[.u[u[u[u[u[u]",
> + value, gmtoff))
> + {
> + apr_time_exp_t expnow;
> + apr_err = apr_time_exp_tz (&expnow, now, gmtoff);
> + if (apr_err != APR_SUCCESS)
> + return svn_error_wrap_apr (apr_err, "Can't manipulate current date");
> + exp.tm_year = expnow.tm_year;
> + exp.tm_mon = expnow.tm_mon;
> + exp.tm_mday = expnow.tm_mday;
> + }
> + else
> + return SVN_NO_ERROR;
> +
> + /* range validation, allowing for leap seconds */
> + if (exp.tm_mon > 11
> + || exp.tm_mday > valid_days_by_month[exp.tm_mon]
> + || exp.tm_hour > 23
> + || exp.tm_min > 59
> + || exp.tm_sec > 60
> + || abs (exp.tm_gmtoff) > 86399
> + || (abs (exp.tm_gmtoff/60)%60) > 59)
> + return SVN_NO_ERROR;
> +
> + /* february/leap-year day checking. tm_year is bias-1900, so centuries
> + that equal 100 (mod 400) are multiples of 400. */
> + if (exp.tm_mon == 1
> + && exp.tm_mday == 29
> + && (exp.tm_year % 4 != 0
> + || (exp.tm_year % 100 == 0 && exp.tm_year % 400 != 100)))
> + return SVN_NO_ERROR;

I'd put more parens for precedence clarity in there, but there are
others who feel that's weak-kneed and lily-livered. Your call.

I'm not going to +1 it for 1.0, as you know I wasn't in favor of doing
this in 1.0. But I certainly won't veto it, and perhaps a small,
secret part of my heart hopes it goes in :-). It's a heck of a lot
cleaner and more manageable than what we've currently got.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Dec 30 22:01:06 2003

This message: [ Message body ]
Next message: kfogel_at_collab.net: "Re: svn commit: r8121 - branches/1.0-stabilization"
Previous message: kfogel_at_collab.net: "Re: ghudson's 'svn st' problem he mentioned on IRC..."
In reply to: mark benedetto king: "Re: RFC: date parser strawman"
Next in thread: mark benedetto king: "Re: RFC: date parser strawman"
Reply: mark benedetto king: "Re: RFC: date parser strawman"
Reply: Greg Hudson: "Re: RFC: date parser strawman"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]