[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] remove svn__strtol() and svn__strtoul()

From: Bert Huijben <bert_at_qqmail.nl>
Date: Wed, 6 Aug 2014 16:47:46 +0000

Some time ago I switched a few of the calls mentioned here around to get a huge performance improvement on Windows. The standard integer parse functions start by skipping whitespace, and only then parse the integer. But whatever is whitespace is locale dependent, and checking for that from the CTY is really slow on Windows.

(If I remember correctly these are exactly the parse id functions you are about to change here)


We shouldn't handle our storage as locale dependent data, as certainly our filesystems data should be completely locale independent. And certainly not if this would account for a 10-15% performance slowdown compared to the current locale independent conversions.





I'm not sure if this is exactly what you are about to revert here. But changing the parsing back to CPU/context switch bound, while it should really be IO bound is not simply removing a micro optimization.


Bert




Sent from Windows Mail





From: Ivan Zhakov
Sent: ‎Wednesday‎, ‎August‎ ‎6‎, ‎2014 ‎5‎:‎31‎ ‎PM
To: dev_at_subversion.apache.org, Stefan Sperling





On 6 August 2014 19:11, Stefan Sperling <stsp_at_elego.de> wrote:
> I'd like to remove svn__strtol() and svn__strtoul().
>
> Let's not get performance and micro-optimisation in the way of error checking.
> Even if the input does not come from the network or cannot for any other
> reason be inherently malicious it might be erroneous due to other
> problems such as disk corruption. I don't believe it's OK to willingly
> parse revision files without such checks. I think it's irresponsible
> to do so and makes diagnosing difficult problems unnecessarily harder.
>
I'm big +1 one this: we should not trade error checking for
performance. Especially for micro-optimization.

See my comments on patch inline.
> [[[
> Remove svn__strtol() and svn__strtoul(). They were introduced as a
> performance enhancement but don't provide sufficient error checking.
>
[...]

>
> return TRUE;
> @@ -93,8 +102,13 @@ static svn_boolean_t
> txn_id_parse(svn_fs_fs__id_part_t *txn_id,
> const char *data)
> {
> - const char *end;
> - txn_id->revision = svn__strtol(data, &end);
> + char *end;
> +
> + apr_set_os_error(0);
> + txn_id->revision = strtol(data, &end, 10);
> + if (data[0] == '\0' || apr_get_os_error() == ERANGE)
apr_get_os_error() uses GetLastError() on Windows, while strtol() uses
standard errno for error code.

> + return FALSE;

> +
> data = strchr(end, '-');
> if (data == NULL)
> return FALSE;
> @@ -478,7 +492,6 @@ svn_fs_fs__id_parse(char *data,
> if (str[0] == 'r')
> {
> apr_int64_t val;
> - const char *tmp;
> svn_error_t *err;
>
> /* This is a revision type ID */
> @@ -489,8 +502,15 @@ svn_fs_fs__id_parse(char *data,
> str = svn_cstring_tokenize("/", &data);
> if (str == NULL)
> return NULL;
> - id->private_id.rev_item.revision = svn__strtol(str, &tmp);
>
> + err = svn_cstring_atoi64(&val, str);
> + if (err)
> + {
> + svn_error_clear(err);
> + return NULL;
> + }
> + id->private_id.rev_item.revision = (svn_revnum_t)val;
> +
> err = svn_cstring_atoi64(&val, data);
> if (err)
> {
> Index: subversion/libsvn_subr/string.c
> ===================================================================
> --- subversion/libsvn_subr/string.c (revision 1616191)
> +++ subversion/libsvn_subr/string.c (working copy)
> @@ -1039,45 +1039,6 @@ svn_cstring_atoi(int *n, const char *str)
> return SVN_NO_ERROR;
> }
>
> -unsigned long
> -svn__strtoul(const char* buffer, const char** end)
> -{
> - unsigned long result = 0;
> -
> - /* this loop will execute in just 2 CPU cycles, confirmed by measurement:
> - 7 macro-ops (max 4 / cycle => 2 cycles)
> - 1 load (max 1 / cycle)
> - 1 jumps (compare + conditional jump == 1 macro op; max 1 / cycle)
> - 2 arithmetic ops (subtract, increment; max 3 / cycle)
> - 2 scale-and-add AGU ops (max 3 / cycle)
> - 1 compiler-generated move operation
> - dependency chain: temp = result * 4 + result; result = temp * 2 + c
> - (2 ops with latency 1 => 2 cycles)
> - */
> - while (1)
> - {
> - unsigned long c = (unsigned char)*buffer - (unsigned char)'0';
> - if (c > 9)
> - break;
> -
> - result = result * 10 + c;
> - ++buffer;
> - }
> -
> - *end = buffer;
> - return result;
> -}
> -
> -long
> -svn__strtol(const char* buffer, const char** end)
> -{
> - if (*buffer == '-')
> - return -(long)svn__strtoul(buffer+1, end);
> - else
> - return (long)svn__strtoul(buffer, end);
> -}
> -
> -
> /* "Precalculated" itoa values for 2 places (including leading zeros).
> * For maximum performance, make sure all table entries are word-aligned.
> */
> Index: subversion/libsvn_subr/time.c
> ===================================================================
> --- subversion/libsvn_subr/time.c (revision 1616191)
> +++ subversion/libsvn_subr/time.c (working copy)
> @@ -28,6 +28,7 @@
> #include <apr_pools.h>
> #include <apr_time.h>
> #include <apr_strings.h>
> +#include <apr_errno.h>
> #include "svn_io.h"
> #include "svn_time.h"
> #include "svn_utf.h"
> @@ -34,9 +35,7 @@
> #include "svn_error.h"
> #include "svn_private_config.h"
>
> -#include "private/svn_string_private.h"
>
> -
>
> /*** Code. ***/
>
> @@ -137,25 +136,32 @@ svn_time_from_cstring(apr_time_t *when, const char
> apr_time_exp_t exploded_time;
> apr_status_t apr_err;
> char wday[4], month[4];
> - const char *c;
> + char *c;
>
> - /* Open-code parsing of the new timestamp format, as this
> - is a hot path for reading the entries file. This format looks
> - like: "2001-08-31T04:24:14.966996Z" */
> - exploded_time.tm_year = (apr_int32_t) svn__strtoul(data, &c);
> - if (*c++ != '-') goto fail;
> - exploded_time.tm_mon = (apr_int32_t) svn__strtoul(c, &c);
> - if (*c++ != '-') goto fail;
> - exploded_time.tm_mday = (apr_int32_t) svn__strtoul(c, &c);
> - if (*c++ != 'T') goto fail;
> - exploded_time.tm_hour = (apr_int32_t) svn__strtoul(c, &c);
> - if (*c++ != ':') goto fail;
> - exploded_time.tm_min = (apr_int32_t) svn__strtoul(c, &c);
> - if (*c++ != ':') goto fail;
> - exploded_time.tm_sec = (apr_int32_t) svn__strtoul(c, &c);
> - if (*c++ != '.') goto fail;
> - exploded_time.tm_usec = (apr_int32_t) svn__strtoul(c, &c);
> - if (*c++ != 'Z') goto fail;
> + /* Open-code parsing of the new timestamp format.
> + This format looks like: "2001-08-31T04:24:14.966996Z" */
> + apr_set_os_error(0);
> + exploded_time.tm_year = (apr_int32_t) strtol(data, &c, 10);
> + if (apr_get_os_error() == ERANGE || *c++ != '-')
Same here.

> + goto fail;
> + exploded_time.tm_mon = (apr_int32_t) strtol(c, &c, 10);
> + if (apr_get_os_error() == ERANGE || *c++ != '-')
> + goto fail;
> + exploded_time.tm_mday = (apr_int32_t) strtol(c, &c, 10);
> + if (apr_get_os_error() == ERANGE || *c++ != 'T')
> + goto fail;
> + exploded_time.tm_hour = (apr_int32_t) strtol(c, &c, 10);
> + if (apr_get_os_error() == ERANGE || *c++ != ':')
> + goto fail;
> + exploded_time.tm_min = (apr_int32_t) strtol(c, &c, 10);
> + if (apr_get_os_error() == ERANGE || *c++ != ':')
> + goto fail;
> + exploded_time.tm_sec = (apr_int32_t) strtol(c, &c, 10);
> + if (apr_get_os_error() == ERANGE || *c++ != '.')
> + goto fail;
> + exploded_time.tm_usec = (apr_int32_t) strtol(c, &c, 10);
> + if (apr_get_os_error() == ERANGE || *c++ != 'Z')
> + goto fail;
>
> exploded_time.tm_year -= 1900;
> exploded_time.tm_mon -= 1;
> Index: subversion/libsvn_subr/types.c
> ===================================================================
> --- subversion/libsvn_subr/types.c (revision 1616191)
> +++ subversion/libsvn_subr/types.c (working copy)
> @@ -21,6 +21,7 @@
> * ====================================================================
> */
>
> +#include <apr_errno.h>
> #include <apr_pools.h>
> #include <apr_uuid.h>
>
> @@ -39,19 +40,34 @@ svn_revnum_parse(svn_revnum_t *rev,
> const char *str,
> const char **endptr)
> {
> - const char *end;
> + char *end;
> + apr_int64_t result;
>
> - svn_revnum_t result = (svn_revnum_t)svn__strtoul(str, &end);
> -
> if (endptr)
> *endptr = str;
>
> + /* Use the same error message string as svn_cstring_strtoui64()
> + to limit the burden on translators. */
> + if (str[0] == '\0')
> + return svn_error_createf(SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> + _("Could not convert '%s' into a number"), "");
> +
> + apr_set_os_error(0);
> +
> + result = (apr_int64_t)strtol(str, &end, 10);
> +
> + if (apr_get_os_error() == ERANGE)
And here.

> + return svn_error_createf(SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> + _("Number '%s' is out of range"), str);
> +
> + if (result < 0)
> + return svn_error_createf(SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> + _("Negative revision number found parsing '%s'"),
> + str);
> if (str == end)
> - return svn_error_createf
> - (SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> - *str == '-' ? _("Negative revision number found parsing '%s'")
> - : _("Invalid revision number found parsing '%s'"),
> - str);
> + return svn_error_createf(SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> + _("Invalid revision number found parsing '%s'"),
> + str);
>
> /* a revision number with more than 9 digits is suspicious.
> Have a closer look at those. */
> @@ -59,14 +75,12 @@ svn_revnum_parse(svn_revnum_t *rev,
> {
> /* we support 32 bit revision numbers only. check for overflows */
> if (str + 10 < end)
> - return svn_error_createf
> - (SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> + return svn_error_createf(SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> _("Revision number longer than 10 digits '%s'"), str);
>
> /* we support 32 bit revision numbers only. check for overflows */
> if (*str > '2' || (apr_uint32_t)result > APR_INT32_MAX)
> - return svn_error_createf
> - (SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> + return svn_error_createf(SVN_ERR_REVNUM_PARSE_FAILURE, NULL,
> _("Revision number too large '%s'"), str);
> }
>
> @@ -73,7 +87,7 @@ svn_revnum_parse(svn_revnum_t *rev,
> if (endptr)
> *endptr = end;
>
> - *rev = result;
> + *rev = (svn_revnum_t)result;
>
> return SVN_NO_ERROR;
> }



--
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com
Received on 2014-08-06 18:55:52 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.