[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 3)

From: Daniel Trebbien <dtrebbien_at_gmail.com>
Date: Sun, 7 Nov 2010 17:36:13 -0800

Hi Daniel,

Thank you for your feedback. I have attached a new patch and log
message that addresses the points that you have mentioned.

> If we can provide the information cheaply, I see no reason not to let
> the APIs provide it.  As some of these APIs are already revved for 1.7,
> we might as well add the new information to them now.

Maybe this should be done with other patches? I am concerned that
this patch is getting long again :)

> And all of this is necessary just so svnsync can /report/ which of the
> two fixes it made, right?  If so, we might move on with teaching svnsync
> to fix newlines too, and add the "Report back how many properties had
> their newlines changed" functionality separately.  (easier to review,
> and I don't think "512 properties were re-newlined" is /that/ useful to
> know for most people)


Having svnsync do the newline normalization would be okay if there
were a function that only performed the re-encoding, which
unfortunately I do not see.

Someone (I forget who) suggested that the final summary message "512
properties were re-newlined" be replaced with a revision-by-revision
listing of the exact properties that were changed in importing each
revision. I like this idea, and was thinking about making patches to
implement it.

>> [[[
>> Adds a public API function, svn_subst_translate_string2(), an extension of
>> svn_subst_translate_string(), that has two, additional output parameters for
>> determining whether re-encoding and/or line ending translation were performed.
> Grammar police: No comma after "two".


> What's happening in this file is essentially "Add the parameters to
> <this> public API, punch them through everything, and finally in
> translate_newline() make use of them".  Could you add something
> (a paragraph?) to the log message that makes that clear?
> Or, another option is two have two subst.c entries in the log message
> --- one for the 'punching through' and one for the interesting changes
> --- though we have less precedent for that.
> I'll leave it to you to find a way to make the overall picture clear :)

I added a couple of paragraphs. One explains that most of the changes
are to send the TRANSLATED_EOL parameter all the way through to
translate_newline(). Another paragraph explains the idea behind my
solution for the efficiency concern that you had about the new
translate_newline() implementation.

>> -/** Translate the data in @a value (assumed to be in encoded in charset
>> +/** @deprecated Provided for backward compatibility with the 1.6 API. Callers
>> + * should use svn_subst_translate_string2().
>> + *
>> + * Translate the data in @a value (assumed to be encoded in charset
>>   * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
>>   * then assume that @a value is in the system-default language encoding.
>>   * Return the translated data in @a *new_value, allocated in @a pool.
>>   */
> Please make the old docstring only describe the differences from the new
> version of the function, and delete the rest of the docstring here.
> Follow the examples in the other header files.


>>  svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
>>                                          const svn_string_t *value,
>>                                          const char *encoding,
>>                                          apr_pool_t *pool);
>> +/** Translate the data in @a value (assumed to be encoded in charset
>> + * @a encoding) to UTF8 and LF line-endings.  If @a encoding is @c NULL,
>> + * then assume that @a value is in the system-default character encoding.
> You changed 'language encoding' to 'character encoding'.

Changed back. Note that the docstring has been reworded since version
2 of the patch. I am now following the new wording and modified it as

>> + * If @a translated_to_utf8 is not @c NULL, then
>> + * @code *translated_to_utf8 @endcode is set to @c TRUE if at least one
>> + * character of @a value in the source charset was translated to UTF-8;  to
>> + * @c FALSE otherwise. If @a translated_line_endings is not @c NULL, then
>> + * @code *translated_line_endings @endcode is set to @c TRUE if at least one
>> + * line ending was changed to LF;  to @c FALSE otherwise.
> Use paragraphs please.

Made into a separate paragraph.

>> + * Return the translated
>> + * data in @a *new_value, allocated in @a pool.
>> + *
>> + * @since New in 1.7
> Trailing period missing.


>> +svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> +                            svn_boolean_t *translated_to_utf8,
>> +                            svn_boolean_t *translated_line_endings,
>> +                            const svn_string_t *value,
>> +                            const char *encoding,
>> +                            apr_pool_t *pool);
> Can you switch this function to the two-pools paradigm
> (result_pool/scratch_pool) while here?  There are examples in
> libsvn_wc/wc_db.h (and elsewhere).
> It already uses scratch_pool internally.

I added another apr_pool_t* parameter. The original one is considered
to be the result pool.

>> @@ -633,7 +637,8 @@ translate_newline(const char *eol_str,
>>                    const char *newline_buf,
>>                    apr_size_t newline_len,
>>                    svn_stream_t *dst,
>> -                  svn_boolean_t repair)
>> +                  svn_boolean_t repair,
>> +                  svn_boolean_t *translated_eol)
>>  {
>>    /* If we've seen a newline before, compare it with our cache to
>>       check for consistency, else cache it for future comparisons. */
>> @@ -653,8 +658,17 @@ translate_newline(const char *eol_str,
>>        strncpy(src_format, newline_buf, newline_len);
>>        *src_format_len = newline_len;
>>      }
>> -  /* Write the desired newline */
>> -  return translate_write(dst, eol_str, eol_str_len);
>> +
>> +  /* Translate the newline */
>> +  SVN_ERR(translate_write(dst, eol_str, eol_str_len));
>> +
>> +  if (translated_eol)
>> +    {
> Right now, this if() will be done every time translate_newline() is
> called.  Could you rework the logic to check it fewer times?
> e.g., to only check it once per translation_baton if possible, or to
> skip checking it if it had been set already, or if REPAIR is false and
> a newline had been seen already, etc.  (These are just ideas, you don't
> have to implement all of them!)
> Stefan2 indicates that the subst() code is a rather expensive part of
> 'checkout' and 'export', so doing the check once-per-file (instead of
> once-per-line) will be nice.  (However, I realize that right now you
> only expose translated_eol for the svn_string_t API.)

Okay. I re-worked the translate_newline() function a bit so that
TRANSLATED_EOL will never be set to TRUE twice per stream and that the
check whether TRANSLATED_EOL is NULL is performed exactly once.

My solution was to make a slight variant of translate_newline(),
called translate_newline_alt(), that has the same signature as
translate_newline() and the same effect except that
translate_newline_alt() does not care about TRANSLATED_EOL. I added a
function pointer field to the translation baton that is either the
address of translate_newline() or translate_newline_alt(). This
allows the code to vary the implementation that is used.

>> @@ -1683,6 +1734,19 @@ svn_subst_translate_string(svn_string_t **new_valu
>>                             const char *encoding,
>>                             apr_pool_t *pool)
>>  {
>> +  return svn_subst_translate_string2(new_value, NULL, NULL, value, encoding,
>> +                                     pool);
>> +}
>> +
> Please move the above wrapper to deprecated.c.


>> +
>> +svn_error_t *
>> +svn_subst_translate_string2(svn_string_t **new_value,
>> +                            svn_boolean_t *translated_to_utf8,
>> +                            svn_boolean_t *translated_line_endings,
>> +                            const svn_string_t *value,
>> +                            const char *encoding,
>> +                            apr_pool_t *pool)
>> +{
>>    const char *val_utf8;
>>    const char *val_utf8_lf;
>>    apr_pool_t *scratch_pool = svn_pool_create(pool);
>> @@ -1703,14 +1767,18 @@ svn_subst_translate_string(svn_string_t **new_valu
>>        SVN_ERR(svn_utf_cstring_to_utf8(&val_utf8, value->data, scratch_pool));
>>      }
> Not related to your patch, but the above if/else block calls the UTF-8
> translation routines on value->data.  Since this is specifically the
> translate_string() API (and there is a separate translate_cstring()
> API), I think either we should fix this (the whole reason svn_string_t
> exists is to allow embedded NULs) or at least document this limitation
> in the API docs.

value->data can be NULL?

Note: I made a change that was not requested. In
translate_newline(), I replaced the condition that checked for
different newline strings with one that I believe to be more
efficient. It used to be:

> if (newline_len != eol_str_len || strncmp(newline_buf, eol_str, newline_len))

Now it is:

> if ((newline_len != eol_str_len) || (*newline_buf != *eol_str))

This exploits the knowledge of the entire set of newline strings that
are recognized (LF, CR, and CRLF).

Received on 2010-11-08 02:36:55 CET

This is an archived mail posted to the Subversion Dev mailing list.