[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 2)

From: Gavin Beau Baumanis <gavinb_at_thespidernet.com>
Date: Wed, 24 Nov 2010 22:43:34 +1100

Hi Daniel (T),

Since in your earlier post you mentioned that you didn't mind a friendly reminder...

I thought I would return this thread to the top of the list - Just i case you had missed Daniel (Shahaf) 's comments.

Gavin "Beau" Baumanis

On 02/11/2010, at 12:35 AM, Daniel Shahaf wrote:

> Gavin Beau Baumanis wrote on Mon, Oct 18, 2010 at 21:06:31 +1100:
>> Ping. This patch submission has received no comments.
>>
>
> Thanks, Gavin. @Daniel, sorry for the delay.
>
>> On 04/10/2010, at 5:55 AM, Daniel Trebbien wrote:
>>
>>> On Fri, Oct 1, 2010 at 3:57 AM, Julian Foad <julian.foad_at_wandisco.com> wrote:
>>>>> Adds a public API function, svn_subst_translate_string2(), an extension of
>>>>> svn_subst_translate_string(), that has two, additional output parameters for
>>>>> determining whether re-encoding and/or line ending translation were performed.
>>>>
>>>> If we're going to add this functionality to _translate_string(),
>>>> shouldn't we also add it to the other variants - _detranslate_string,
>>>> _translate_cstring2, _stream_translated, _copy_and_translate4?
>>>>
>>>> I see you're adding the infrastructure into the (new) bodies of
>>>> _translate_cstring2 and _stream_translated, just not (yey) exposing it
>>>> in their APIs.
>>>>
>
> If we can provide the information cheaply, I see no reason not to let
> the APIs provide it. As some of these APIs are already revved for 1.7,
> we might as well add the new information to them now.
>
>>>> I'm not sure. The set of 'translation' functions is already messy and
>>>> it's not clear to me how useful this new functionality will be.
>>>
>>> I originally began working on svn_subst_translate_string2() because I
>>> could not find a combination of public API functions that would allow
>>> me to determine whether the line endings changed when a string was
>>> both re-encoded to UTF-8 and normalized to LF line endings. The most
>>> applicable, svn_subst_translate_string(), performs both translations
>>> without indicating whether it re-encoded the string or normalized a
>>> line ending.
>>>
>
> And all of this is necessary just so svnsync can /report/ which of the
> two fixes it made, right? If so, we might move on with teaching svnsync
> to fix newlines too, and add the "Report back how many properties had
> their newlines changed" functionality separately. (easier to review,
> and I don't think "512 properties were re-newlined" is /that/ useful to
> know for most people)
>
>>> An alternative to extending svn_subst_translate_string() is to add a
>>> public API function that does the re-encoding and another that
>>> normalizes line endings. I think, however, that extending
>>> svn_subst_translate_string() is better because though the current
>>> implementation of svn_subst_translate_string() re-encodes, then
>>> normalizes line endings, the single API function allows for the
>>> possibility of a future improvement in efficiency as a result of
>>> performing both translations simultaneously.
>>>
>>>
>>>
>>> Attached are a revised patch and log message.
>>> <log message.txt><patch.txt>
>
>> [[[
>> Adds a public API function, svn_subst_translate_string2(), an extension of
>> svn_subst_translate_string(), that has two, additional output parameters for
>> determining whether re-encoding and/or line ending translation were performed.
>>
>
> Grammar police: No comma after "two".
>
>>
>> As discussed at:
>> http://thread.gmane.org/gmane.comp.version-control.subversion.devel/122550
>> http://thread.gmane.org/gmane.comp.version-control.subversion.devel/123020
>>
>>
>> * subversion/include/svn_subst.h
>> (svn_subst_translate_string2): New function.
>> (svn_subst_translate_string): Deprecated in favor of
>> svn_subst_translate_string2().
>>
>> * subversion/libsvn_subr/subst.c
>
> What's happening in this file is essentially "Add the parameters to
> <this> public API, punch them through everything, and finally in
> translate_newline() make use of them". Could you add something
> (a paragraph?) to the log message that makes that clear?
>
> Or, another option is two have two subst.c entries in the log message
> --- one for the 'punching through' and one for the interesting changes
> --- though we have less precedent for that.
>
> I'll leave it to you to find a way to make the overall picture clear :)
>
>> (translate_newline): Added the new parameter TRANSLATED_EOL, the value at
>> which is set to TRUE if the newline string in EOL_STR is different than
>> the newline string in NEWLINE_BUF.
>> (translation_baton): Added the TRANSLATED_EOL field.
>> (create_translation_baton): Added a new parameter TRANSLATED_EOL that is
>> passed to the resulting translation_baton.
>> (translate_chunk): Modified the three calls to translate_newline() to pass
>> the TRANSLATED_EOL pointer from the translation baton.
>> (stream_translated): New static function. Its implementation is the old
>> implementation of svn_subst_stream_translated(), but accepting another
>> parameter, TRANSLATED_EOL, that is passed to the in/out translation batons
>> that it creates.
>> (svn_subst_stream_translated): Now a wrapper for stream_translated().
>> (translate_cstring): New static function. Its implementation is the old
>> implementation of svn_subst_translate_cstring2(), but modified to accept
>> another parameter, TRANSLATED_EOL, that is passed to stream_translated().
>> (svn_subst_translate_cstring2): Now a wrapper for translate_cstring().
>> (svn_subst_translate_string2): New function. The task of recording whether
>> it translated a line ending is delegated to translate_cstring().
>> ]]]
>
>> Index: subversion/include/svn_subst.h
>> ===================================================================
>> --- subversion/include/svn_subst.h (revision 1004022)
>> +++ subversion/include/svn_subst.h (working copy)
>> @@ -588,16 +588,41 @@ svn_subst_stream_detranslated(svn_stream_t **strea
>>
>> /* EOL conversion and character encodings */
>>
>> -/** Translate the data in @a value (assumed to be in encoded in charset
>> +/** @deprecated Provided for backward compatibility with the 1.6 API. Callers
>> + * should use svn_subst_translate_string2().
>> + *
>> + * Translate the data in @a value (assumed to be encoded in charset
>> * @a encoding) to UTF8 and LF line-endings. If @a encoding is @c NULL,
>> * then assume that @a value is in the system-default language encoding.
>> * Return the translated data in @a *new_value, allocated in @a pool.
>> */
>
> Please make the old docstring only describe the differences from the new
> version of the function, and delete the rest of the docstring here.
> Follow the examples in the other header files.
>
>> +SVN_DEPRECATED
>> svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
>> const svn_string_t *value,
>> const char *encoding,
>> apr_pool_t *pool);
>>
>> +/** Translate the data in @a value (assumed to be encoded in charset
>> + * @a encoding) to UTF8 and LF line-endings. If @a encoding is @c NULL,
>> + * then assume that @a value is in the system-default character encoding.
>
> You changed 'language encoding' to 'character encoding'.
>
>> + * If @a translated_to_utf8 is not @c NULL, then
>> + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
>> + * character of @a value in the source charset was translated to UTF-8; to
>> + * @c FALSE otherwise. If @a translated_line_endings is not @c NULL, then
>> + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
>> + * line ending was changed to LF; to @c FALSE otherwise.
>
> Use paragraphs please.
>
>> + * Return the translated
>> + * data in @a *new_value, allocated in @a pool.
>> + *
>> + * @since New in 1.7
>
> Trailing period missing.
>
> (Vim tip: ^X^L completes a line.)
>
>> + */
>> +svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> + svn_boolean_t *translated_to_utf8,
>> + svn_boolean_t *translated_line_endings,
>> + const svn_string_t *value,
>> + const char *encoding,
>> + apr_pool_t *pool);
>
> Can you switch this function to the two-pools paradigm
> (result_pool/scratch_pool) while here? There are examples in
> libsvn_wc/wc_db.h (and elsewhere).
>
> It already uses scratch_pool internally.
>
>> +
>> /** Translate the data in @a value from UTF8 and LF line-endings into
>> * native locale and native line-endings, or to the output locale if
>> * @a for_output is TRUE. Return the translated data in @a
>> Index: subversion/libsvn_subr/subst.c
>> ===================================================================
>> --- subversion/libsvn_subr/subst.c (revision 1004022)
>> +++ subversion/libsvn_subr/subst.c (working copy)
>> @@ -620,6 +620,10 @@ translate_keyword(char *buf,
>> newline in the file, and copy it to {SRC_FORMAT, *SRC_FORMAT_LEN} to
>> use for later consistency checks.
>>
>> + Sets *TRANSLATED_EOL to TRUE if TRANSLATED_EOL is not NULL and the newline
>> + string that was written (EOL_STR) is not the same as the newline string
>> + that was translated (NEWLINE_BUF).
>> +
>> Note: all parameters are required even if REPAIR is TRUE.
>> ### We could require that REPAIR must not change across a sequence of
>> calls, and could then optimize by not using SRC_FORMAT at all if
>> @@ -633,7 +637,8 @@ translate_newline(const char *eol_str,
>> const char *newline_buf,
>> apr_size_t newline_len,
>> svn_stream_t *dst,
>> - svn_boolean_t repair)
>> + svn_boolean_t repair,
>> + svn_boolean_t *translated_eol)
>> {
>> /* If we've seen a newline before, compare it with our cache to
>> check for consistency, else cache it for future comparisons. */
>> @@ -653,8 +658,17 @@ translate_newline(const char *eol_str,
>> strncpy(src_format, newline_buf, newline_len);
>> *src_format_len = newline_len;
>> }
>> - /* Write the desired newline */
>> - return translate_write(dst, eol_str, eol_str_len);
>> +
>> + /* Translate the newline */
>> + SVN_ERR(translate_write(dst, eol_str, eol_str_len));
>> +
>> + if (translated_eol)
>> + {
>
> Right now, this if() will be done every time translate_newline() is
> called. Could you rework the logic to check it fewer times?
>
> e.g., to only check it once per translation_baton if possible, or to
> skip checking it if it had been set already, or if REPAIR is false and
> a newline had been seen already, etc. (These are just ideas, you don't
> have to implement all of them!)
>
> Stefan2 indicates that the subst() code is a rather expensive part of
> 'checkout' and 'export', so doing the check once-per-file (instead of
> once-per-line) will be nice. (However, I realize that right now you
> only expose translated_eol for the svn_string_t API.)
>
>> + if (newline_len != eol_str_len ||
>> + strncmp(newline_buf, eol_str, newline_len))
>> + *translated_eol = TRUE;
>> + }
>> + return SVN_NO_ERROR;
>> }
>>
>>
>> @@ -1683,6 +1734,19 @@ svn_subst_translate_string(svn_string_t **new_valu
>> const char *encoding,
>> apr_pool_t *pool)
>> {
>> + return svn_subst_translate_string2(new_value, NULL, NULL, value, encoding,
>> + pool);
>> +}
>> +
>
> Please move the above wrapper to deprecated.c.
>
>> +
>> +svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> + svn_boolean_t *translated_to_utf8,
>> + svn_boolean_t *translated_line_endings,
>> + const svn_string_t *value,
>> + const char *encoding,
>> + apr_pool_t *pool)
>> +{
>> const char *val_utf8;
>> const char *val_utf8_lf;
>> apr_pool_t *scratch_pool = svn_pool_create(pool);
>> @@ -1703,14 +1767,18 @@ svn_subst_translate_string(svn_string_t **new_valu
>> SVN_ERR(svn_utf_cstring_to_utf8(&val_utf8, value->data, scratch_pool));
>> }
>>
>
> Not related to your patch, but the above if/else block calls the UTF-8
> translation routines on value->data. Since this is specifically the
> translate_string() API (and there is a separate translate_cstring()
> API), I think either we should fix this (the whole reason svn_string_t
> exists is to allow embedded NULs) or at least document this limitation
> in the API docs.
>
>> - SVN_ERR(svn_subst_translate_cstring2(val_utf8,
>> - &val_utf8_lf,
>> - "\n", /* translate to LF */
>> - FALSE, /* no repair */
>> - NULL, /* no keywords */
>> - FALSE, /* no expansion */
>> - scratch_pool));
>> + if (translated_to_utf8)
>> + *translated_to_utf8 = (strcmp(value->data, val_utf8) != 0);
>>
>> + SVN_ERR(translate_cstring(&val_utf8_lf,
>> + translated_line_endings,
>> + val_utf8,
>> + "\n", /* translate to LF */
>> + FALSE, /* no repair */
>> + NULL, /* no keywords */
>> + FALSE, /* no expansion */
>> + scratch_pool));
>> +
>> *new_value = svn_string_create(val_utf8_lf, pool);
>> svn_pool_destroy(scratch_pool);
>> return SVN_NO_ERROR;
>
> Lots of plumbing, but looks good :-)
>
> Thanks,
>
> Daniel
>
Received on 2010-11-24 12:44:19 CET

This is an archived mail posted to the Subversion Dev mailing list.