[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 2)

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Mon, 1 Nov 2010 15:35:07 +0200

Gavin Beau Baumanis wrote on Mon, Oct 18, 2010 at 21:06:31 +1100:
> Ping. This patch submission has received no comments.
>

Thanks, Gavin. @Daniel, sorry for the delay.

> On 04/10/2010, at 5:55 AM, Daniel Trebbien wrote:
>
> > On Fri, Oct 1, 2010 at 3:57 AM, Julian Foad <julian.foad_at_wandisco.com> wrote:
> >>> Adds a public API function, svn_subst_translate_string2(), an extension of
> >>> svn_subst_translate_string(), that has two, additional output parameters for
> >>> determining whether re-encoding and/or line ending translation were performed.
> >>
> >> If we're going to add this functionality to _translate_string(),
> >> shouldn't we also add it to the other variants - _detranslate_string,
> >> _translate_cstring2, _stream_translated, _copy_and_translate4?
> >>
> >> I see you're adding the infrastructure into the (new) bodies of
> >> _translate_cstring2 and _stream_translated, just not (yey) exposing it
> >> in their APIs.
> >>

If we can provide the information cheaply, I see no reason not to let
the APIs provide it. As some of these APIs are already revved for 1.7,
we might as well add the new information to them now.

> >> I'm not sure. The set of 'translation' functions is already messy and
> >> it's not clear to me how useful this new functionality will be.
> >
> > I originally began working on svn_subst_translate_string2() because I
> > could not find a combination of public API functions that would allow
> > me to determine whether the line endings changed when a string was
> > both re-encoded to UTF-8 and normalized to LF line endings. The most
> > applicable, svn_subst_translate_string(), performs both translations
> > without indicating whether it re-encoded the string or normalized a
> > line ending.
> >

And all of this is necessary just so svnsync can /report/ which of the
two fixes it made, right? If so, we might move on with teaching svnsync
to fix newlines too, and add the "Report back how many properties had
their newlines changed" functionality separately. (easier to review,
and I don't think "512 properties were re-newlined" is /that/ useful to
know for most people)

> > An alternative to extending svn_subst_translate_string() is to add a
> > public API function that does the re-encoding and another that
> > normalizes line endings. I think, however, that extending
> > svn_subst_translate_string() is better because though the current
> > implementation of svn_subst_translate_string() re-encodes, then
> > normalizes line endings, the single API function allows for the
> > possibility of a future improvement in efficiency as a result of
> > performing both translations simultaneously.
> >
> >
> >
> > Attached are a revised patch and log message.
> > <log message.txt><patch.txt>

> [[[
> Adds a public API function, svn_subst_translate_string2(), an extension of
> svn_subst_translate_string(), that has two, additional output parameters for
> determining whether re-encoding and/or line ending translation were performed.
>

Grammar police: No comma after "two".

>
> As discussed at:
> http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
> http://thread.gmane.org/gmane.comp.version-control.subversion.devel/123020
>
>
> * subversion/include/svn_subst.h
> (svn_subst_translate_string2): New function.
> (svn_subst_translate_string): Deprecated in favor of
> svn_subst_translate_string2().
>
> * subversion/libsvn_subr/subst.c

What's happening in this file is essentially "Add the parameters to
<this> public API, punch them through everything, and finally in
translate_newline() make use of them". Could you add something
(a paragraph?) to the log message that makes that clear?

Or, another option is two have two subst.c entries in the log message
--- one for the 'punching through' and one for the interesting changes
--- though we have less precedent for that.

I'll leave it to you to find a way to make the overall picture clear :)

> (translate_newline): Added the new parameter TRANSLATED_EOL, the value at
> which is set to TRUE if the newline string in EOL_STR is different than
> the newline string in NEWLINE_BUF.
> (translation_baton): Added the TRANSLATED_EOL field.
> (create_translation_baton): Added a new parameter TRANSLATED_EOL that is
> passed to the resulting translation_baton.
> (translate_chunk): Modified the three calls to translate_newline() to pass
> the TRANSLATED_EOL pointer from the translation baton.
> (stream_translated): New static function. Its implementation is the old
> implementation of svn_subst_stream_translated(), but accepting another
> parameter, TRANSLATED_EOL, that is passed to the in/out translation batons
> that it creates.
> (svn_subst_stream_translated): Now a wrapper for stream_translated().
> (translate_cstring): New static function. Its implementation is the old
> implementation of svn_subst_translate_cstring2(), but modified to accept
> another parameter, TRANSLATED_EOL, that is passed to stream_translated().
> (svn_subst_translate_cstring2): Now a wrapper for translate_cstring().
> (svn_subst_translate_string2): New function. The task of recording whether
> it translated a line ending is delegated to translate_cstring().
> ]]]

> Index: subversion/include/svn_subst.h
> ===================================================================
> --- subversion/include/svn_subst.h (revision 1004022)
> +++ subversion/include/svn_subst.h (working copy)
> @@ -588,16 +588,41 @@ svn_subst_stream_detranslated(svn_stream_t **strea
>
> /* EOL conversion and character encodings */
>
> -/** Translate the data in @a value (assumed to be in encoded in charset
> +/** @deprecated Provided for backward compatibility with the 1.6 API. Callers
> + * should use svn_subst_translate_string2().
> + *
> + * Translate the data in @a value (assumed to be encoded in charset
> * @a encoding) to UTF8 and LF line-endings. If @a encoding is @c NULL,
> * then assume that @a value is in the system-default language encoding.
> * Return the translated data in @a *new_value, allocated in @a pool.
> */

Please make the old docstring only describe the differences from the new
version of the function, and delete the rest of the docstring here.
Follow the examples in the other header files.

> +SVN_DEPRECATED
> svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
> const svn_string_t *value,
> const char *encoding,
> apr_pool_t *pool);
>
> +/** Translate the data in @a value (assumed to be encoded in charset
> + * @a encoding) to UTF8 and LF line-endings. If @a encoding is @c NULL,
> + * then assume that @a value is in the system-default character encoding.

You changed 'language encoding' to 'character encoding'.

> + * If @a translated_to_utf8 is not @c NULL, then
> + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
> + * character of @a value in the source charset was translated to UTF-8; to
> + * @c FALSE otherwise. If @a translated_line_endings is not @c NULL, then
> + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
> + * line ending was changed to LF; to @c FALSE otherwise.

Use paragraphs please.

> + * Return the translated
> + * data in @a *new_value, allocated in @a pool.
> + *
> + * @since New in 1.7

Trailing period missing.

(Vim tip: ^X^L completes a line.)

> + */
> +svn_error_t *
> +svn_subst_translate_string2(svn_string_t **new_value,
> + svn_boolean_t *translated_to_utf8,
> + svn_boolean_t *translated_line_endings,
> + const svn_string_t *value,
> + const char *encoding,
> + apr_pool_t *pool);

Can you switch this function to the two-pools paradigm
(result_pool/scratch_pool) while here? There are examples in
libsvn_wc/wc_db.h (and elsewhere).

It already uses scratch_pool internally.

> +
> /** Translate the data in @a value from UTF8 and LF line-endings into
> * native locale and native line-endings, or to the output locale if
> * @a for_output is TRUE. Return the translated data in @a
> Index: subversion/libsvn_subr/subst.c
> ===================================================================
> --- subversion/libsvn_subr/subst.c (revision 1004022)
> +++ subversion/libsvn_subr/subst.c (working copy)
> @@ -620,6 +620,10 @@ translate_keyword(char *buf,
> newline in the file, and copy it to {SRC_FORMAT, *SRC_FORMAT_LEN} to
> use for later consistency checks.
>
> + Sets *TRANSLATED_EOL to TRUE if TRANSLATED_EOL is not NULL and the newline
> + string that was written (EOL_STR) is not the same as the newline string
> + that was translated (NEWLINE_BUF).
> +
> Note: all parameters are required even if REPAIR is TRUE.
> ### We could require that REPAIR must not change across a sequence of
> calls, and could then optimize by not using SRC_FORMAT at all if
> @@ -633,7 +637,8 @@ translate_newline(const char *eol_str,
> const char *newline_buf,
> apr_size_t newline_len,
> svn_stream_t *dst,
> - svn_boolean_t repair)
> + svn_boolean_t repair,
> + svn_boolean_t *translated_eol)
> {
> /* If we've seen a newline before, compare it with our cache to
> check for consistency, else cache it for future comparisons. */
> @@ -653,8 +658,17 @@ translate_newline(const char *eol_str,
> strncpy(src_format, newline_buf, newline_len);
> *src_format_len = newline_len;
> }
> - /* Write the desired newline */
> - return translate_write(dst, eol_str, eol_str_len);
> +
> + /* Translate the newline */
> + SVN_ERR(translate_write(dst, eol_str, eol_str_len));
> +
> + if (translated_eol)
> + {

Right now, this if() will be done every time translate_newline() is
called. Could you rework the logic to check it fewer times?

e.g., to only check it once per translation_baton if possible, or to
skip checking it if it had been set already, or if REPAIR is false and
a newline had been seen already, etc. (These are just ideas, you don't
have to implement all of them!)

Stefan2 indicates that the subst() code is a rather expensive part of
'checkout' and 'export', so doing the check once-per-file (instead of
once-per-line) will be nice. (However, I realize that right now you
only expose translated_eol for the svn_string_t API.)

> + if (newline_len != eol_str_len ||
> + strncmp(newline_buf, eol_str, newline_len))
> + *translated_eol = TRUE;
> + }
> + return SVN_NO_ERROR;
> }
>
>
> @@ -1683,6 +1734,19 @@ svn_subst_translate_string(svn_string_t **new_valu
> const char *encoding,
> apr_pool_t *pool)
> {
> + return svn_subst_translate_string2(new_value, NULL, NULL, value, encoding,
> + pool);
> +}
> +

Please move the above wrapper to deprecated.c.

> +
> +svn_error_t *
> +svn_subst_translate_string2(svn_string_t **new_value,
> + svn_boolean_t *translated_to_utf8,
> + svn_boolean_t *translated_line_endings,
> + const svn_string_t *value,
> + const char *encoding,
> + apr_pool_t *pool)
> +{
> const char *val_utf8;
> const char *val_utf8_lf;
> apr_pool_t *scratch_pool = svn_pool_create(pool);
> @@ -1703,14 +1767,18 @@ svn_subst_translate_string(svn_string_t **new_valu
> SVN_ERR(svn_utf_cstring_to_utf8(&val_utf8, value->data, scratch_pool));
> }
>

Not related to your patch, but the above if/else block calls the UTF-8
translation routines on value->data. Since this is specifically the
translate_string() API (and there is a separate translate_cstring()
API), I think either we should fix this (the whole reason svn_string_t
exists is to allow embedded NULs) or at least document this limitation
in the API docs.

> - SVN_ERR(svn_subst_translate_cstring2(val_utf8,
> - &val_utf8_lf,
> - "\n", /* translate to LF */
> - FALSE, /* no repair */
> - NULL, /* no keywords */
> - FALSE, /* no expansion */
> - scratch_pool));
> + if (translated_to_utf8)
> + *translated_to_utf8 = (strcmp(value->data, val_utf8) != 0);
>
> + SVN_ERR(translate_cstring(&val_utf8_lf,
> + translated_line_endings,
> + val_utf8,
> + "\n", /* translate to LF */
> + FALSE, /* no repair */
> + NULL, /* no keywords */
> + FALSE, /* no expansion */
> + scratch_pool));
> +
> *new_value = svn_string_create(val_utf8_lf, pool);
> svn_pool_destroy(scratch_pool);
> return SVN_NO_ERROR;

Lots of plumbing, but looks good :-)

Thanks,

Daniel
Received on 2010-11-01 14:37:50 CET

This is an archived mail posted to the Subversion Dev mailing list.