Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 3)

From: Daniel Trebbien <dtrebbien_at_gmail.com>
Date: Sun, 7 Nov 2010 17:36:13 -0800

Hi Daniel,

Thank you for your feedback. I have attached a new patch and log
message that addresses the points that you have mentioned.

> If we can provide the information cheaply, I see no reason not to let
> the APIs provide it. Â As some of these APIs are already revved for 1.7,
> we might as well add the new information to them now.

Maybe this should be done with other patches? I am concerned that
this patch is getting long again :)

> And all of this is necessary just so svnsync can /report/ which of the
> two fixes it made, right? Â If so, we might move on with teaching svnsync
> to fix newlines too, and add the "Report back how many properties had
> their newlines changed" functionality separately. Â (easier to review,
> and I don't think "512 properties were re-newlined" is /that/ useful to
> know for most people)

Yes.

Having svnsync do the newline normalization would be okay if there
were a function that only performed the re-encoding, which
unfortunately I do not see.

Someone (I forget who) suggested that the final summary message "512
properties were re-newlined" be replaced with a revision-by-revision
listing of the exact properties that were changed in importing each
revision. I like this idea, and was thinking about making patches to
implement it.

>> [[[
>> Adds a public API function, svn_subst_translate_string2(), an extension of
>> svn_subst_translate_string(), that has two, additional output parameters for
>> determining whether re-encoding and/or line ending translation were performed.
>>
>
> Grammar police: No comma after "two".

Fixed.

> What's happening in this file is essentially "Add the parameters to
> <this> public API, punch them through everything, and finally in
> translate_newline() make use of them". Â Could you add something
> (a paragraph?) to the log message that makes that clear?
>
> Or, another option is two have two subst.c entries in the log message
> --- one for the 'punching through' and one for the interesting changes
> --- though we have less precedent for that.
>
> I'll leave it to you to find a way to make the overall picture clear :)

I added a couple of paragraphs. One explains that most of the changes
are to send the TRANSLATED_EOL parameter all the way through to
translate_newline(). Another paragraph explains the idea behind my
solution for the efficiency concern that you had about the new
translate_newline() implementation.

>> -/** Translate the data in @a value (assumed to be in encoded in charset
>> +/** @deprecated Provided for backward compatibility with the 1.6 API. Callers
>> + * should use svn_subst_translate_string2().
>> + *
>> + * Translate the data in @a value (assumed to be encoded in charset
>> Â * @a encoding) to UTF8 and LF line-endings. Â If @a encoding is @c NULL,
>> Â * then assume that @a value is in the system-default language encoding.
>> Â * Return the translated data in @a *new_value, allocated in @a pool.
>> Â */
>
> Please make the old docstring only describe the differences from the new
> version of the function, and delete the rest of the docstring here.
> Follow the examples in the other header files.

Fixed.

>> +SVN_DEPRECATED
>> Â svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const svn_string_t *value,
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const char *encoding,
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â apr_pool_t *pool);
>>
>> +/** Translate the data in @a value (assumed to be encoded in charset
>> + * @a encoding) to UTF8 and LF line-endings. Â If @a encoding is @c NULL,
>> + * then assume that @a value is in the system-default character encoding.
>
> You changed 'language encoding' to 'character encoding'.

Changed back. Note that the docstring has been reworded since version
2 of the patch. I am now following the new wording and modified it as
appropriate.

>> + * If @a translated_to_utf8 is not @c NULL, then
>> + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
>> + * character of @a value in the source charset was translated to UTF-8; Â to
>> + * @c FALSE otherwise. If @a translated_line_endings is not @c NULL, then
>> + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
>> + * line ending was changed to LF; Â to @c FALSE otherwise.
>
> Use paragraphs please.

Made into a separate paragraph.

>> + * Return the translated
>> + * data in @a *new_value, allocated in @a pool.
>> + *
>> + * @since New in 1.7
>
> Trailing period missing.

Fixed.

>> +svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â svn_boolean_t *translated_to_utf8,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â svn_boolean_t *translated_line_endings,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â const svn_string_t *value,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â const char *encoding,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â apr_pool_t *pool);
>
> Can you switch this function to the two-pools paradigm
> (result_pool/scratch_pool) while here? Â There are examples in
> libsvn_wc/wc_db.h (and elsewhere).
>
> It already uses scratch_pool internally.

I added another apr_pool_t* parameter. The original one is considered
to be the result pool.

>> @@ -633,7 +637,8 @@ translate_newline(const char *eol_str,
>> Â Â Â Â Â Â Â Â Â Â const char *newline_buf,
>> Â Â Â Â Â Â Â Â Â Â apr_size_t newline_len,
>> Â Â Â Â Â Â Â Â Â Â svn_stream_t *dst,
>> - Â Â Â Â Â Â Â Â Â svn_boolean_t repair)
>> + Â Â Â Â Â Â Â Â Â svn_boolean_t repair,
>> + Â Â Â Â Â Â Â Â Â svn_boolean_t *translated_eol)
>> Â {
>> Â Â /* If we've seen a newline before, compare it with our cache to
>> Â Â Â check for consistency, else cache it for future comparisons. */
>> @@ -653,8 +658,17 @@ translate_newline(const char *eol_str,
>> Â Â Â Â strncpy(src_format, newline_buf, newline_len);
>> Â Â Â Â *src_format_len = newline_len;
>> Â Â Â }
>> - Â /* Write the desired newline */
>> - Â return translate_write(dst, eol_str, eol_str_len);
>> +
>> + Â /* Translate the newline */
>> + Â SVN_ERR(translate_write(dst, eol_str, eol_str_len));
>> +
>> + Â if (translated_eol)
>> + Â Â {
>
> Right now, this if() will be done every time translate_newline() is
> called. Â Could you rework the logic to check it fewer times?
>
> e.g., to only check it once per translation_baton if possible, or to
> skip checking it if it had been set already, or if REPAIR is false and
> a newline had been seen already, etc. Â (These are just ideas, you don't
> have to implement all of them!)
>
> Stefan2 indicates that the subst() code is a rather expensive part of
> 'checkout' and 'export', so doing the check once-per-file (instead of
> once-per-line) will be nice. Â (However, I realize that right now you
> only expose translated_eol for the svn_string_t API.)

Okay. I re-worked the translate_newline() function a bit so that
TRANSLATED_EOL will never be set to TRUE twice per stream and that the
check whether TRANSLATED_EOL is NULL is performed exactly once.

My solution was to make a slight variant of translate_newline(),
called translate_newline_alt(), that has the same signature as
translate_newline() and the same effect except that
translate_newline_alt() does not care about TRANSLATED_EOL. I added a
function pointer field to the translation baton that is either the
address of translate_newline() or translate_newline_alt(). This
allows the code to vary the implementation that is used.

>> @@ -1683,6 +1734,19 @@ svn_subst_translate_string(svn_string_t **new_valu
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â const char *encoding,
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â apr_pool_t *pool)
>> Â {
>> + Â return svn_subst_translate_string2(new_value, NULL, NULL, value, encoding,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â pool);
>> +}
>> +
>
> Please move the above wrapper to deprecated.c.

Done.

>> +
>> +svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â svn_boolean_t *translated_to_utf8,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â svn_boolean_t *translated_line_endings,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â const svn_string_t *value,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â const char *encoding,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â apr_pool_t *pool)
>> +{
>> Â Â const char *val_utf8;
>> Â Â const char *val_utf8_lf;
>> Â Â apr_pool_t *scratch_pool = svn_pool_create(pool);
>> @@ -1703,14 +1767,18 @@ svn_subst_translate_string(svn_string_t **new_valu
>> Â Â Â Â SVN_ERR(svn_utf_cstring_to_utf8(&val_utf8, value->data, scratch_pool));
>> Â Â Â }
>>
>
> Not related to your patch, but the above if/else block calls the UTF-8
> translation routines on value->data. Â Since this is specifically the
> translate_string() API (and there is a separate translate_cstring()
> API), I think either we should fix this (the whole reason svn_string_t
> exists is to allow embedded NULs) or at least document this limitation
> in the API docs.

value->data can be NULL?

Note: I made a change that was not requested. In
translate_newline(), I replaced the condition that checked for
different newline strings with one that I believe to be more
efficient. It used to be:

> if (newline_len != eol_str_len || strncmp(newline_buf, eol_str, newline_len))

Now it is:

> if ((newline_len != eol_str_len) || (*newline_buf != *eol_str))

This exploits the knowledge of the entire set of newline strings that
are recognized (LF, CR, and CRLF).

text/plain attachment: log_message.txt

text/plain attachment: patch3.txt

Received on 2010-11-08 02:36:55 CET

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]