Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 4)

From: Danny Trebbien <dtrebbien_at_gmail.com>
Date: Tue, 30 Nov 2010 16:33:17 -0800

Attached is version 4 of the patch and corresponding log message that
address Daniel Shahaf's feedback from November 9.

Per a message from Julian
(http://article.gmane.org/gmane.comp.version-control.subversion.devel/124073)
and Daniel Shahaf
(http://article.gmane.org/gmane.comp.version-control.subversion.devel/124082),
I have removed the changes to optimize translate_newline() for now.

> There is value for a summary line --- just like the "Conflicts summary"
> in 'svn up' and friends (svn/notify.c): if someone runs 'svnsync sync'
> manually and doesn't pipe the output into a pager, they still know at
> the end something was wrong.

Good point. I could do a summary at the end as well as more detailed
messages for each revision that is affected.

>> >> +/** Translate the data in @a value (assumed to be encoded in charset
>> >> + * @a encoding) to UTF8 and LF line-endings. If @a encoding is @c NULL,
>> >> + * then assume that @a value is in the system-default character encoding.
>> >
>> > You changed 'language encoding' to 'character encoding'.
>>
>> Changed back. Note that the docstring has been reworded since version
>> 2 of the patch. I am now following the new wording and modified it as
>> appropriate.
>>
>
> I'm sorry: I wasn't clear. I wasn't actually asking you to revert that,
> but just tried to ask why you made that change (because it is departs
> from the existing wording, but I didn't recall seeing the change
> mentioned anywhere).

Okay.

>> > Not related to your patch, but the above if/else block calls the UTF-8
>> > translation routines on value->data. Since this is specifically the
>> > translate_string() API (and there is a separate translate_cstring()
>> > API), I think either we should fix this (the whole reason svn_string_t
>> > exists is to allow embedded NULs) or at least document this limitation
>> > in the API docs.
>>
>> value->data can be NULL?
>
> NUL != NULL. In general, the 'data' member of an svn_string_t is never
> a NULL pointer. However, the memory it points to --- value->data[0]
> through value->data[value->len-1] --- may contain NUL (aka ASCII 0,
> aka '\0') bytes.

I understand now.

> Also: I forgot to say this earlier, but parts of this subst.c code have
> been refactored on the 'performance' branch. Some of those refactorings
> have been merged, but I'm not sure if all of them were. Could you have
> a look there and see if there are any unmerged changes there? And if
> so, how they relate to this patch?

Attached is a diff of subversion/libsvn_subr/subst.c from
trunk_at_1040648 to branches/performance_at_1040648. The unmerged changes
were to add static functions translated_stream_skip() and
translated_stream_buffered() and use them in
svn_subst_read_specialfile() when configuring the svn_stream_t S. My
changes affect different parts of the file.

>> (translation_baton): Added the TRANSLATED_EOL and TRANSLATE_NEWLINE_FN fields.
>> (create_translation_baton): Added a new parameter TRANSLATED_EOL that is
>> passed to the resulting translation_baton. Sets TRANSLATE_NEWLINE_FN to
>> either &translate_newline or &translate_newline_alt as appropriate.
>> (translate_chunk): Replaced the three calls to translate_newline() with
>> calls to TRANSLATE_NEWLINE_FN.
>
> Present tense, so s/Added/Add/, etc.

Fixed.

>> * subversion/libsvn_subr/deprecated.c
>> Received the implementation of the deprecated wrapper function
>> svn_subst_translate_string().
>
> Write in the standard file-symbol syntax:
>
> * subversion/libsvn_subr/deprecated.c
> (svn_subst_translate_string): ...

Fixed.

>> Index: subversion/include/svn_subst.h
>> ===================================================================
>> --- subversion/include/svn_subst.h (revision 1032431)
>> +++ subversion/include/svn_subst.h (working copy)
>> @@ -592,19 +592,46 @@ svn_subst_stream_detranslated(svn_stream_t **strea
>>
>> /* EOL conversion and character encodings */
>>
>> +/** @deprecated Provided for backward compatibility with the 1.6 API. Callers
>> + * should use svn_subst_translate_string2().
>> + *
>> + * Similar to svn_subst_translate_string2(), except that the information about
>> + * whether re-encoding or line ending translation were performed is discarded.
>> + */
>
> Okay, except that the two paragraphs are in the wrong order, and the
> "should use" comment isn't needed in this case.

Fixed.

>> +SVN_DEPRECATED
>> +svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
>> + const svn_string_t *value,
>> + const char *encoding,
>> + apr_pool_t *pool);
>> +
>> /** Translate the string @a value from character encoding @a encoding to
>> * UTF8, and also from its current line-ending style to LF line-endings. If
>> * @a encoding is @c NULL, translate from the system-default encoding.
>> *
>> + * If @a translated_to_utf8 is not @c NULL, then
>> + * <code>*translated_to_utf8</code> is set to @c TRUE if at least one
>> + * character of @a value in the source character encoding was translated to
>> + * UTF-8; to @c FALSE otherwise. If @a translated_line_endings is not @c NULL,
>> + * then <code>*translated_line_endings</code> is set to @c TRUE if at least one
>> + * line ending was changed to LF; to @c FALSE otherwise.
>> + *
>
> Thanks for paragraphing this. I'd add another paragraph break after the
> first "otherwise", so that each parameter is described in its own paragraph.

Fixed.

>> + This function assumes that NEWLINE_BUF and EOL_STR are either "\n", "\r", or
>> + "\r\n".
>
> I suggest
> s/assumes that .* are/assumes that each of .* is/
> for clarity.

Changed.

>> + assert((eol_str_len == 2 && eol_str[0] == '\r' && eol_str[1] == '\n') ||
>> + (eol_str_len == 1 && (eol_str[0] == '\n' || eol_str[0] == '\r')));
>> + assert((newline_len == 2 && newline_buf[0] == '\r' &&
>> + newline_buf[1] == '\n') ||
>> + (newline_len == 1 && (newline_buf[0] == '\n' ||
>> + newline_buf[0] == '\r')));
>> +
>
> s/assert/SVN_ERR_ASSERT/, because that's more friendly to library users.
>
> I'd like to see a comment on this assert, the condition is too long for me.
>
> Both here and in the "unsolicited" change above, perhaps introduce named
> macros would make the code more readable?
>
> #define SAME_EOL(eol_str1, eol_str2) /* ... */
> #defien STRING_IS_EOL(maybe_eol_str) /* ... */

Yes. This is a good idea. I added STRING_IS_EOL and
DIFFERENT_EOL_STRINGS with a doc string for each.

>> + b->translate_newline_fn = &translate_newline_alt; // Now that
>> + // TRANSLATED_EOL has
>> + // been set to TRUE,
>> + // switch the
>> + // translate_newline
>> + // function that is used
>
> No C++ comments; we follow the C89 standard, which doesn't allow them.

Okay.

>> + apr_pool_t *scratch_pool = svn_pool_create(result_pool);
>> + svn_error_t *res = svn_subst_translate_string2(new_value, NULL, NULL, value,
>> + encoding, result_pool, scratch_pool);
>> + svn_pool_destroy(scratch_pool);
>
> Is it worth the overhead to create a subpool here?
>
> On the one hand, that's what the current code does.
>
> On the other hand, creating a subpool allocates 8KB right away (right?
> #subpool_question), and an svn_string_t is probably smaller than that.
> (Something larger would be transferred in a stream, I hope!) So it
> seems a subpool is not worth the overhead, and it'll be better to just
> pass result_pool as both pool arguments.

Changed.

text/plain attachment: subversion-libsvn_subr-subst.c_trunk_1040648_to_performance_1040648.diff.txt

text/plain attachment: patch4.txt

text/plain attachment: log_message4.txt

Received on 2010-12-01 01:38:41 CET

This message: [ Message body ]
Next message: Blair Zajac: "Deprecate svn_fs_revision_prop() and svn_fs_txn_prop()?"
Previous message: Johan Corveleyn: "diff-optimizations-tokens branch: I think I'm going to abandon it"
Next in thread: Julian Foad: "Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 4)"
Reply: Julian Foad: "Re: [PATCH] extend svn_subst_translate_string() to record whether re-encoding and/or line ending translation were performed (v. 4)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]