[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] svnsync 1.6 works even when source has eol-style impurities

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Sat, 11 Apr 2009 15:21:19 +0300 (Jerusalem Daylight Time)

B Smith-Mannschott wrote on Wed, 1 Apr 2009 at 19:39 +0200:
> Here's my attempt at a patch for the svnsync issues raised by this
> thread. Please let me know what I can do to improve it.
>
> [[[
> Make is possible to use svnsync 1.6 even when source has eol-style impurities.
>

Are there any relevant issue numbers to be mentioned here? (I don't think
there is an issue specifically for these new problems, but asking.)

> * subversion/include/svn_string.h
> (svn_string_with_normalized_eol_style): new function.
> This provides eol normalization for us.
>
> * subversion/libsvn_subr/svn_string.c
> (svn_string_with_normalized_eol_style): new
> (svn_stringbuf_normalize_eol_style): new

The second function uses the public naming convention, but isn't public.
(see below)

> This does the actual work of the eol normalization by mutating
> a stringbuf provided by the caller.
>
> * a/subversion/svnsync/main.c
> (edit_baton_t): added field normalize_prop_eol_style
> (change_file_prop, change_dir_prop): normalize eol-style for svn:*
> properties, conditional on normalize_prop_eol_style.
> (get_sync_editor): initialize normalize_prop_eol_style from
> the environment variable SVNSYNC_UNSUPPORTED_NORMALIZE_PROP_EOL_STYLE.
> If this is defined, we will do normalization, otherwise not.

> (replay_rev_started): normalize eol-style of svn:log comments
> before they are written through to the destination repository.

Not just svn:log, other revprops are affected too (e.g., svn:author; this
was reported in the tracker). Should use svn_prop_needs_translation().

> ]]]
>
> Notes
> =====
>
> 1. I've placed svn_string_with_normalized_eol_style() into libsvn_subr
> because it seems like generally useful functionality. If this is not
> the case, it can be moved to subversion/svnsync/main.c
>

I'm okay leaving the logic public; clients also need it.

> 2. I'm not very confident that the logic in replay_rev_started is in
> the correct place. I tried a number of approaches before finding
> this one, which seems to work. My difficulty in following the

You may also need the logic in replay_rev_finished() (because it does
write revprops), and in do_synchronize() in the "if (currently_copying)"
case.

> control flow (delgation through function poniters ahoy!) means I
> couldn't make a convincing argument as to why this is the right
> place for this logic.
>

The documentation of svn_ra_replay_range() and of the callback *types*
(e.g., svn_ra_replay_revstart_callback_t) should have told you when/how
they are called; didn't it?

(I agree that the documentation comments of replay_rev_started() are not
very helpful.)

> 3. I've not tackled the question of automated testing at all. I'll
> attach a dump file which contains a directory property and an
> svn:log with containing \r (i.e. ^M) so that an interested party
> can at least play with the code on an example.
>
> I'll invest the time necessary to provide a decent automated test
> should there be interest in actually incorporating this patch. (I
> wouldn't want it included without such tests.)
>
> // Ben
>

> diff --git a/subversion/include/svn_string.h b/subversion/include/svn_string.h
> index 2154ef7..c71fc58 100644
> --- a/subversion/include/svn_string.h
> +++ b/subversion/include/svn_string.h
> @@ -290,6 +290,23 @@ svn_stringbuf_first_non_whitespace(const svn_stringbuf_t *str);
> void
> svn_stringbuf_strip_whitespace(svn_stringbuf_t *str);
>
> +/** Return a svn_string_t with contents of @a str with linebreaks normalized.
> + *
> + * @a str contains zero or more linebreaks of the form \n or \r\n.
> + * The returned svn_string_t will have the same number of linebreaks
> + * as @a str but will contain no \r characters.
> + *
> + * The returned svn_string_t may be identical (the same memory) to @a
> + * str, when @a str is already fully eol-normalized.
> + *
> + * We make no attempt to handle lone \r characters (as used in Classic
> + * Mac OS) correctly as this is not a supported platform. (This
> + * simplifies the implementation.)

Supporting \r's is easy (s/0x0D/0x0A/g), why not do it?

> + */
> +svn_string_t *
> +svn_string_with_normalized_eol_style(const svn_string_t *str, apr_pool_t *pool);
> +
> +
> /** Return position of last occurrence of @a ch in @a str, or return
> * @a str->len if no occurrence.
> */
> diff --git a/subversion/libsvn_subr/svn_string.c b/subversion/libsvn_subr/svn_string.c
> index c64ebdc..feb78e7 100644
> --- a/subversion/libsvn_subr/svn_string.c
> +++ b/subversion/libsvn_subr/svn_string.c
> @@ -460,6 +460,46 @@ svn_stringbuf_strip_whitespace(svn_stringbuf_t *str)
> }
>
>
> +static void
> +svn_stringbuf_normalize_eol_style(svn_stringbuf_t *str)
> +{
> + char *r, *w; /* read and write pointer, w <= r */
> + char *z; /* end of buf pointer, r < z */
> +
> + /* We walk r and w along the buffer, copying from r to w. Initially
> + they are equal, though w will lag behind r as we encounter '\r'
> + chars. */
> +
> + z = &(str->data[str->len]);
> + w = r = &(str->data[0]);
> +
> + while (r < z)
> + {
> + if (*r != '\r') /* Copy char and advance the writer when not \r */
> + { /* Copying from r to w is a no-op when w and r are */
> + *w++ = *r; /* equal. Advancing w, however always takes place */
> + }
> + ++r; /* always advance the reader */
> + }
> +

*headache*

I'm not sure what you need the stringbuf for. Personally, I'd write it
along the lines of

    if '\r' in src->data:
        char *new_data = ...;
        while (i < src->len)
            new_data[i] = (src[i] == '\r' ? '\n' : src[i]);

        new_string->data = new_data;

> + /* We must now re-establish the invariants of str.
> + w points to the correct place for the null terminator:
> + one beyond the last copied character. */
> + *w = '\0';
> + str->len = w - &(str->data[0]);
> +}
> +
> +/* Current implementation always returns a new svn_string_t, though
> + API does not promise this. */
> +svn_string_t *
> +svn_string_with_normalized_eol_style(const svn_string_t *str, apr_pool_t *pool)
> +{
> + svn_stringbuf_t *buf;
> + buf = svn_stringbuf_create_from_string(str, pool);
> + svn_stringbuf_normalize_eol_style(buf);
> + return svn_string_create_from_buf(buf, pool);
> +}
> +
> apr_size_t
> svn_stringbuf_find_char_backward(const svn_stringbuf_t *str, char ch)
> {
> diff --git a/subversion/svnsync/main.c b/subversion/svnsync/main.c
> index 3772041..987397a 100644
> --- a/subversion/svnsync/main.c
> +++ b/subversion/svnsync/main.c
> @@ -222,7 +222,6 @@ check_lib_versions(void)
> return svn_ver_check_list(&my_version, checklist);
> }
>
> -

Unrelated whitespace change. Remove it.

> /* Acquire a lock (of sorts) on the repository associated with the
> * given RA SESSION.
> */
> @@ -754,6 +753,7 @@ typedef struct {
> svn_boolean_t mergeinfo_stripped; /* Did we strip svn:mergeinfo? */
> svn_boolean_t svnmerge_migrated; /* Did we convert svnmerge.py data? */
> svn_boolean_t svnmerge_blocked; /* Was there any blocked svnmerge data? */
> + svn_boolean_t normalize_prop_eol_style; /* strip \r from svn: props? */
> } edit_baton_t;
>
>
> @@ -1003,6 +1003,14 @@ change_file_prop(void *file_baton,
> eb->svnmerge_blocked = TRUE;
> }
>
> + if (eb->normalize_prop_eol_style && value && svn_prop_is_svn_prop(name))

s/svn_prop_is_svn_prop/svn_prop_needs_translation/

> + {
> + svn_string_t *real_value;
> + real_value = svn_string_with_normalized_eol_style(value, pool);
> + return eb->wrapped_editor->change_file_prop(fb->wrapped_node_baton,
> + name, real_value, pool);
> + }
> +
> return eb->wrapped_editor->change_file_prop(fb->wrapped_node_baton,

So now there are two nearly-identical 'return change_file_prop()' calls.
The code would be better (easier to extend) if there was just one, IMO.

> name, value, pool);
> }
> @@ -1084,6 +1092,10 @@ change_dir_prop(void *dir_baton,
> name = SVN_PROP_MERGEINFO;
> eb->svnmerge_migrated = TRUE;
> }
> + else if (eb->normalize_prop_eol_style && value && svn_prop_is_svn_prop(name))

s/svn_prop_is_svn_prop/svn_prop_needs_translation/

> + {
> + real_value = svn_string_with_normalized_eol_style(value, pool);
> + }
>
> /* Remember if we see any svnmerge-blocked properties. */
> if (eb->migrate_svnmerge && (strcmp(name, "svnmerge-blocked") == 0))
> @@ -1198,6 +1210,10 @@ get_sync_editor(const svn_delta_editor_t *wrapped_editor,
> eb->migrate_svnmerge = TRUE;
> eb->strip_mergeinfo = TRUE;
> }
> + if (getenv("SVNSYNC_UNSUPPORTED_NORMALIZE_PROP_EOL_STYLE"))
> + {
> + eb->normalize_prop_eol_style = TRUE;
> + }
>
> *editor = tree_editor;
> *edit_baton = eb;
> @@ -1389,6 +1405,18 @@ replay_rev_started(svn_revnum_t revision,
> apr_hash_set(filtered, SVN_PROP_REVISION_LOG, APR_HASH_KEY_STRING,
> svn_string_create("", pool));
>
> + /* Now that we have assured that there will be an svn:log message,
> + normalize line breaks in log message.*/
> + if (getenv("SVNSYNC_UNSUPPORTED_NORMALIZE_PROP_EOL_STYLE"))
> + {
> + svn_string_t *svn_log;
> + svn_log = apr_hash_get(filtered, SVN_PROP_REVISION_LOG,
> + APR_HASH_KEY_STRING);
> + svn_log = svn_string_with_normalized_eol_style(svn_log, pool);
> + apr_hash_set(filtered, SVN_PROP_REVISION_LOG, APR_HASH_KEY_STRING,
> + svn_log);
> + }
> +

As I said: you want this logic not just for svn:log, but maybe also for
other props. You can iterate over all props you're about to commit
(compare filter_props()) and see if any of them needs_translation().

Also (as I wrote above), this renormalization logic is (likely; haven't
tested it) needed elsewhere in svnsync/main.c as well.

Daniel

> SVN_ERR(svn_ra_get_commit_editor3(rb->to_session, &commit_editor,
> &commit_baton,
> filtered,

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1648359
Received on 2009-04-11 14:21:47 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.