Attached is version 4 of the patch and corresponding log message that
address Daniel Shahaf's feedback from November 9.
Per a message from Julian
and Daniel Shahaf
I have removed the changes to optimize translate_newline() for now.
> There is value for a summary line --- just like the "Conflicts summary"
> in 'svn up' and friends (svn/notify.c): if someone runs 'svnsync sync'
> manually and doesn't pipe the output into a pager, they still know at
> the end something was wrong.
Good point. I could do a summary at the end as well as more detailed
messages for each revision that is affected.
>> >> +/** Translate the data in @a value (assumed to be encoded in charset
>> >> + * @a encoding) to UTF8 and LF line-endings. If @a encoding is @c NULL,
>> >> + * then assume that @a value is in the system-default character encoding.
>> > You changed 'language encoding' to 'character encoding'.
>> Changed back. Note that the docstring has been reworded since version
>> 2 of the patch. I am now following the new wording and modified it as
> I'm sorry: I wasn't clear. I wasn't actually asking you to revert that,
> but just tried to ask why you made that change (because it is departs
> from the existing wording, but I didn't recall seeing the change
> mentioned anywhere).
>> > Not related to your patch, but the above if/else block calls the UTF-8
>> > translation routines on value->data. Since this is specifically the
>> > translate_string() API (and there is a separate translate_cstring()
>> > API), I think either we should fix this (the whole reason svn_string_t
>> > exists is to allow embedded NULs) or at least document this limitation
>> > in the API docs.
>> value->data can be NULL?
> NUL != NULL. In general, the 'data' member of an svn_string_t is never
> a NULL pointer. However, the memory it points to --- value->data
> through value->data[value->len-1] --- may contain NUL (aka ASCII 0,
> aka '\0') bytes.
I understand now.
> Also: I forgot to say this earlier, but parts of this subst.c code have
> been refactored on the 'performance' branch. Some of those refactorings
> have been merged, but I'm not sure if all of them were. Could you have
> a look there and see if there are any unmerged changes there? And if
> so, how they relate to this patch?
Attached is a diff of subversion/libsvn_subr/subst.c from
trunk_at_1040648 to branches/performance_at_1040648. The unmerged changes
were to add static functions translated_stream_skip() and
translated_stream_buffered() and use them in
svn_subst_read_specialfile() when configuring the svn_stream_t S. My
changes affect different parts of the file.
>> (translation_baton): Added the TRANSLATED_EOL and TRANSLATE_NEWLINE_FN fields.
>> (create_translation_baton): Added a new parameter TRANSLATED_EOL that is
>> passed to the resulting translation_baton. Sets TRANSLATE_NEWLINE_FN to
>> either &translate_newline or &translate_newline_alt as appropriate.
>> (translate_chunk): Replaced the three calls to translate_newline() with
>> calls to TRANSLATE_NEWLINE_FN.
> Present tense, so s/Added/Add/, etc.
>> * subversion/libsvn_subr/deprecated.c
>> Received the implementation of the deprecated wrapper function
> Write in the standard file-symbol syntax:
> * subversion/libsvn_subr/deprecated.c
> (svn_subst_translate_string): ...
>> Index: subversion/include/svn_subst.h
>> --- subversion/include/svn_subst.h (revision 1032431)
>> +++ subversion/include/svn_subst.h (working copy)
>> @@ -592,19 +592,46 @@ svn_subst_stream_detranslated(svn_stream_t **strea
>> /* EOL conversion and character encodings */
>> +/** @deprecated Provided for backward compatibility with the 1.6 API. Callers
>> + * should use svn_subst_translate_string2().
>> + *
>> + * Similar to svn_subst_translate_string2(), except that the information about
>> + * whether re-encoding or line ending translation were performed is discarded.
>> + */
> Okay, except that the two paragraphs are in the wrong order, and the
> "should use" comment isn't needed in this case.
>> +svn_error_t *svn_subst_translate_string(svn_string_t **new_value,
>> + const svn_string_t *value,
>> + const char *encoding,
>> + apr_pool_t *pool);
>> /** Translate the string @a value from character encoding @a encoding to
>> * UTF8, and also from its current line-ending style to LF line-endings. If
>> * @a encoding is @c NULL, translate from the system-default encoding.
>> + * If @a translated_to_utf8 is not @c NULL, then
>> + * <code>*translated_to_utf8</code> is set to @c TRUE if at least one
>> + * character of @a value in the source character encoding was translated to
>> + * UTF-8; to @c FALSE otherwise. If @a translated_line_endings is not @c NULL,
>> + * then <code>*translated_line_endings</code> is set to @c TRUE if at least one
>> + * line ending was changed to LF; to @c FALSE otherwise.
>> + *
> Thanks for paragraphing this. I'd add another paragraph break after the
> first "otherwise", so that each parameter is described in its own paragraph.
>> + This function assumes that NEWLINE_BUF and EOL_STR are either "\n", "\r", or
>> + "\r\n".
> I suggest
> s/assumes that .* are/assumes that each of .* is/
> for clarity.
>> + assert((eol_str_len == 2 && eol_str == '\r' && eol_str == '\n') ||
>> + (eol_str_len == 1 && (eol_str == '\n' || eol_str == '\r')));
>> + assert((newline_len == 2 && newline_buf == '\r' &&
>> + newline_buf == '\n') ||
>> + (newline_len == 1 && (newline_buf == '\n' ||
>> + newline_buf == '\r')));
> s/assert/SVN_ERR_ASSERT/, because that's more friendly to library users.
> I'd like to see a comment on this assert, the condition is too long for me.
> Both here and in the "unsolicited" change above, perhaps introduce named
> macros would make the code more readable?
> #define SAME_EOL(eol_str1, eol_str2) /* ... */
> #defien STRING_IS_EOL(maybe_eol_str) /* ... */
Yes. This is a good idea. I added STRING_IS_EOL and
DIFFERENT_EOL_STRINGS with a doc string for each.
>> + b->translate_newline_fn = &translate_newline_alt; // Now that
>> + // TRANSLATED_EOL has
>> + // been set to TRUE,
>> + // switch the
>> + // translate_newline
>> + // function that is used
> No C++ comments; we follow the C89 standard, which doesn't allow them.
>> + apr_pool_t *scratch_pool = svn_pool_create(result_pool);
>> + svn_error_t *res = svn_subst_translate_string2(new_value, NULL, NULL, value,
>> + encoding, result_pool, scratch_pool);
>> + svn_pool_destroy(scratch_pool);
> Is it worth the overhead to create a subpool here?
> On the one hand, that's what the current code does.
> On the other hand, creating a subpool allocates 8KB right away (right?
> #subpool_question), and an svn_string_t is probably smaller than that.
> (Something larger would be transferred in a stream, I hope!) So it
> seems a subpool is not worth the overhead, and it'll be better to just
> pass result_pool as both pool arguments.
Received on 2010-12-01 01:38:41 CET