[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] "svnlook diff" option to filter files revisited

From: Karl Fogel <kfogel_at_red-bean.com>
Date: 2007-04-07 21:22:58 CEST

Pete Gonzalez <pgonzalez@bluel.com> writes:
> Below are some relevant excerpts from the previous discussion. The
> redux seemed to be that people wanted me to either (1) use something
> other than svnlook to create diffs or (2) invent lots of other similar
> switches to provide a generalized context for --no-diff. After having
> had a year to reflect on this, I still feel that the original proposal
> is reasonable and worth revisiting.

Are there any arguments for 'svnlook diff --no-diff "..."' that
wouldn't also apply to 'svn diff --no-diff "..."' ? (In other words,
wouldn't we patch both?)

Further thoughts: would an 'svn:no-diff' property do this better, so
people wouldn't have to construct the option's parameter all the time?
A --force flag could override it. Perhaps it could co-exist with the
--no-diff option.

Yes, I'm engaging in rampant speculation here. But I'd like us to
identify the problem and think about all ways to solve it. Often a
first patch turns out to be a subset of what's really called for.

-Karl

> _______________________________________________________________________________
> From: Pete Gonzalez <pgonzalez_at_bluel.com>
> Date: 2005-11-18 04:25:02 CET
>
> The attached patch is for this file:
>
>
> subversion/subversion/svnlook/main.c
>
>
> It adds a new command-line option "--no-diff", which is used to prevent
> svnlook from displaying differences for certain files. (The effect is
> similar the way binary files are excluded.) For example:
>
>
> svnlook diff my_repository -r 1234 --no-diff "*.resx *Designer.cs"
>
>
> In this example, if revision #1234 affected any filenames such as
> "MyForm.resx" or "MyForm.Designer.cs", they will be excluded from the diff
> (although they will still be listed as changed).
>
>
> I created this option because our Subversion log e-mails were becoming
> cluttered by a lot of verbose diffs for text files that are normally edited
> using automated tools. For example, in C# a .resx file might contain
> thousands of lines of BASE64-encoded image data.
>
>
> Cheers,
> -Pete
>
>
> Index: main.c
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>
> _______________________________________________________________________________
> From: Julian Foad <julianfoad_at_btopenworld.com>
> Date: 2005-11-18 14:30:00 CET
>
> Pete Gonzalez wrote:
>> Ben Collins-Sussman wrote:
>>
>>> On the other hand, there are much, much simpler ways to accomplish
>>> this. Just attach an svn:mime-type property to all your .resx files
>>> which is something non-texty (i.e. not text/*). The diff engine will
>>> automatically skip over any file that the client thinks isn't text.
>>
>> The actual subset is more complicated than just "*.resx", and since this
>> repository contains thousands of files, I think it would be impractical
>> to expect people to manually set these properties for individual files.
>
>
> You (one user or administrator) can set all the existing ones at once, and then
> use "auto-props" to make sure that new files get the property as well. (Read
> about "auto-props" in the book or ask on the "users@" email list.)
>
>
>> It makes more sense to have a global wildcard that can be updated from
>> time to time when new problems appear.
>
>
> I can certainly see that that would do what you want today, nicely.
>
>
> The thing is, it doesn't seem like a general enough solution to include in the
> product. Tomorrow you will find (or the next person who tries to use it will
> find) that a filename wildcard isn't enough, and it needs to distinguish
> between files in different directories, or between short (hand-written) files
> and long (machine-generated) ones of the same type, or ...
>
>
> Hmm... Talking of short and long, have you looked at the commit mailer hook
> scripts available? At least one of them has support for suppressing diffs that
> are longer than a certain length. This was discussed recently on the "dev@"
> list, as somebody was wanting to put a similar feature into one of the other
> mailer scripts. I'm not sure whether that was per diffed file or for the whole
> diff output, but such a script might be the best place in which to implement
> such a feature, not needing to be compiled in to Subversion itself.
>
>
> We are grateful to you for wanting to contribute your patch to the software.
> To explain why the emphasis of these replies is on using the existing features:
> before we can incorporate any new feature we need to discuss the design
> rationale for the feature: why it does what it does, why it doesn't do other
> things, etc., to be sure that it is a logically complete feature that fits in
> well with the rest of the system.
>
>
> If, after trying the properties approach and/or the mailer script approach, you
> feel it doesn't solve your problem well, then please discuss it here so we can
> determine whether some enhancement or bug fix to the existing features would
> help or whether you still need something more like this patch provides. Of
> course if there is a clear need for something like this then we (including you)
> can go ahead with designing and implementing it.
>
>
>
>> Also, IIRC the "svn:mime-type"
>> property affects all diffs, but in most interactive contexts people
>> would want to see differences.
>
>
> Do you really believe that people would normally want to see the differences in
> those "*.resx" files that you mentioned when they use "svn diff", but would not
> want to see them in the commit email? That sounds implausible to me. Maybe
> there are a few files for which they might occasionally want to do so, but
> surely not most files most often.
>
>
> (I will admit that there is a design flaw in our "external diff" design whereby
> it is currenly impossible to get any diff of a file that has a MIME type
> property that Subversion thinks is "binary". Fixing that would overcome this
> particular point.)
>
>
>
>> The idea of using properties makes sense, but maybe if a directory
>> property recursively applied to its contents. Can you think of any
>> other simpler solutions?
>
>
> That might be handy, depending on what it did exactly, but the "auto-props"
> feature should work well enough for the time being. Let us know right away if
> it doesn't.
>
>
> - Julian
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>
>
> _______________________________________________________________________________
> From: Pete Gonzalez <pgonzalez_at_bluel.com>
> Date: 2005-11-18 20:39:48 CET
>
> Julian Foad wrote:
>>> It makes more sense to have a global wildcard that can be updated from
>>> time to time when new problems appear.
>>
>> I can certainly see that that would do what you want today, nicely.
>>
>> The thing is, it doesn't seem like a general enough solution to
>> include in the product. Tomorrow you will find (or the next
>> person who tries to use it will find) that a filename wildcard
>> isn't enough, and it needs to distinguish between files in
>> different directories, or between short (hand-written)
>> files and long (machine-generated) ones of the same type, or ...
>
>
> I agree that there are probably more flexible interfaces than
> my special-purpose filter. However, couldn't a similar argument
> be made regarding "--no-diff-added" and "--no-diff-deleted"?
> So far I haven't seen any clear design goals for svnlook other
> than "almost no options at all". For example, try typing
> "man diff" some time. :-)
>
>
> Internally, svnlook's "diff" subcommand performs a number of
> steps such as:
>
>
> - Build a list of all changed paths, classified as "added",
> "modified", or "deleted"
> - Determine which ones are binary files
> - Discard some of them according to "--no-diff-deleted" etc
> - Detect property changes
> - Collect everything into a formatted report, executing the Unix
> diff command on each changed file
>
>
> Trying to duplicate that functionality in a bash script (e.g. if
> the "diff" subcommand didn't exist) would be quite complex. That's
> why I think the svnlook algorithm should be more configurable.
>
>
> Regarding the particular interface, I don't have strong opinions as
> long as it's simple to use.
>
>
>> Hmm... Talking of short and long, have you looked at the commit
>> mailer hook scripts available? At least one of them has support
>> for suppressing diffs that are longer than a certain length. This
>> was discussed recently on the "dev@" list, as somebody was wanting
>> to put a similar feature into one of the other mailer scripts.
>
>
> I would like to see such a script if it really exists. I originally
> tried to solve this problem in the hook script (which is arguably
> where hacky heuristics belong). As far as I can tell, the only
> way would be to generate the full diff output and then parse
> it (e.g. with a Perl script). I chose to modify svnlook because I
> thought it would be less typing and more clean.
>
>
>> Do you really believe that people would normally want to see the
>> differences in those "*.resx" files that you mentioned when
>> they use "svn diff", but would not want to see them in the
>> commit email? That sounds implausible to me.
>
>
> During development, it's very important to be able to compare exactly
> what has changed in every single text file. Before committing my
> work, I always review every single diff of every single text file.
> You can catch a lot of errors this way, as well as a lot of spurious
> edits that can be safely removed. With .resx files it's particularly
> important because spurious edits are responsible for a lot of
> spurious Subversion conflicts. (Also, I must confess that I
> use TortoiseSVN, so the extra clutter is trivial to navigate.)
>
>
> By contrast, in the hook e-mail you just want to see what people
> are really working on. Huge diff blocks for machine generated
> files make these e-mails nearly useless. It's a big enough
> nuisance that I invested time trying to make the e-mails as
> concise as possible. I say all this as an argument that
> svn:mime-type affects too many other systems, when all I wanted
> to do was customize a simple e-mail.
>
>
> More to the point: Subversion is a tool for managing file diffs.
> So why is it strange to expect lots of options for customizing
> the report of what files were changed?
>
>
> Cheers,
> -Pete
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>
>
> _______________________________________________________________________________
> From: Julian Foad <julianfoad_at_btopenworld.com>
> Date: 2005-11-19 01:31:13 CET
>
> Pete Gonzalez wrote:
>> Julian Foad wrote:
>>
>>>>It makes more sense to have a global wildcard that can be updated from
>>>>time to time when new problems appear.
>>>
>>>I can certainly see that that would do what you want today, nicely.
>>>
>>>The thing is, it doesn't seem like a general enough solution to
>>>include in the product. Tomorrow you will find (or the next
>>>person who tries to use it will find) that a filename wildcard
>>>isn't enough, and it needs to distinguish between files in
>>>different directories, or between short (hand-written)
>>>files and long (machine-generated) ones of the same type, or ...
>>
>>
>> I agree that there are probably more flexible interfaces than
>> my special-purpose filter. However, couldn't a similar argument
>> be made regarding "--no-diff-added" and "--no-diff-deleted"?
>
>
> Yes. There's a lot of truth in what you say in this email.
>
>
>> So far I haven't seen any clear design goals for svnlook other
>> than "almost no options at all". For example, try typing
>> "man diff" some time. :-)
>>
>> Internally, svnlook's "diff" subcommand performs a number of
>> steps such as:
>>
>> - Build a list of all changed paths, classified as "added",
>> "modified", or "deleted"
>> - Determine which ones are binary files
>> - Discard some of them according to "--no-diff-deleted" etc
>> - Detect property changes
>> - Collect everything into a formatted report, executing the Unix
>> diff command on each changed file
>>
>> Trying to duplicate that functionality in a bash script (e.g. if
>> the "diff" subcommand didn't exist) would be quite complex. That's
>> why I think the svnlook algorithm should be more configurable.
>>
>> Regarding the particular interface, I don't have strong opinions as
>> long as it's simple to use.
>>
>>
>>>Hmm... Talking of short and long, have you looked at the commit
>>>mailer hook scripts available? At least one of them has support
>>>for suppressing diffs that are longer than a certain length. This
>>>was discussed recently on the "dev@" list, as somebody was wanting
>>>to put a similar feature into one of the other mailer scripts.
>>
>> I would like to see such a script if it really exists. I originally
>> tried to solve this problem in the hook script (which is arguably
>> where hacky heuristics belong). As far as I can tell, the only
>> way would be to generate the full diff output and then parse
>> it (e.g. with a Perl script). I chose to modify svnlook because I
>> thought it would be less typing and more clean.
>
>
> One of the mailers that doesn't yet do it is
> tools/hook-scripts/mailer/mailer.py. On about 3rd November, Alan Barrett
> posted a patch to add a "truncate_diff_lines" option to it, and I believe other
> mailers that support a similar thing already were mentioned by respondents to
> the thread.
>
>
>
>>>Do you really believe that people would normally want to see the
>>>differences in those "*.resx" files that you mentioned when
>>>they use "svn diff", but would not want to see them in the
>>>commit email? That sounds implausible to me.
>>
>> During development, it's very important to be able to compare exactly
>> what has changed in every single text file. Before committing my
>> work, I always review every single diff of every single text file.
>
>
> Well, all right. (I suppose I don't know what it's like working with such files.)
>
>
> [...]
>> By contrast, in the hook e-mail you just want to see what people
>> are really working on. Huge diff blocks for machine generated
>> files make these e-mails nearly useless. It's a big enough
>> nuisance that I invested time trying to make the e-mails as
>> concise as possible. I say all this as an argument that
>> svn:mime-type affects too many other systems, when all I wanted
>> to do was customize a simple e-mail.
>>
>> More to the point: Subversion is a tool for managing file diffs.
>> So why is it strange to expect lots of options for customizing
>> the report of what files were changed?
>
>
> Again, you're on to a reasonable argument. Perhaps you would like to make a
> more general post with a proposal to get the community to recognise this, and
> come up with some sort of plan for the features and options that should be
> desired? People don't like adding another little option that doesn't fit into
> a plan, but if there's a plan then it's a completely different story.
>
>
> - Julian
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>
> _______________________________________________________________________________
> From: Pete Gonzalez <pgonzalez_at_bluel.com>
> Date: 2005-11-23 19:45:12 CET
>
> Julian Foad wrote:
>>> More to the point: Subversion is a tool for managing file diffs.
>>> So why is it strange to expect lots of options for customizing
>>> the report of what files were changed?
>>
>> Again, you're on to a reasonable argument. Perhaps you would like to
>> make a more general post with a proposal to get the community to
>> recognise this, and come up with some sort of plan for the features and
>> options that should be desired? People don't like adding another little
>> option that doesn't fit into a plan, but if there's a plan then it's a
>> completely different story.
>
>
> Fair enough. I will think about this and see if I have any ideas for a
> more generalized proposal. The diff subcommand's basic steps are (1)
> filtering the file list, (2) fetching files and passing them to GNU diff,
> and (3) concatenating the results into a report. Various customizations
> could occur at each stage:
>
>
> step 1: exclude certain files (e.g. with a specific suffix, property, or
> revision action) before they are fetched
>
>
> step 2: configure the diff command (e.g. to ignore whitespace or to execute
> a different utility)
>
>
> step 3: prepend a report header with the file list and log entry (as
> commit-email.pl does) or apply an output template for e.g. HTML or PDF.
>
>
> Maybe these changes would enable commit-email.pl to be simplified as well.
> If anyone else has further ideas for interesting customizations, please
> post them here -- I think the interface should be optimized around actual
> customization scenarios.
>
>
> Cheers,
> -Pete
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
> --- subversion/svnlook/main.c.orig 2007-04-07 13:22:32.589579944 -0400
> +++ subversion/svnlook/main.c 2007-04-07 13:41:04.750505784 -0400
> @@ -75,6 +75,7 @@
> svnlook__show_ids,
> svnlook__no_diff_deleted,
> svnlook__no_diff_added,
> + svnlook__no_diff,
> svnlook__diff_copy_from,
> svnlook__revprop_opt,
> svnlook__full_paths,
> @@ -113,6 +114,9 @@
> {"no-diff-added", svnlook__no_diff_added, 0,
> N_("do not print differences for added files")},
>
> + {"no-diff", svnlook__no_diff, 1,
> + N_("do not print differences for files matching ARG")},
> +
> {"diff-copy-from", svnlook__diff_copy_from, 0,
> N_("print differences against the copy source")},
>
> @@ -158,7 +162,7 @@
> N_("usage: svnlook diff REPOS_PATH\n\n"
> "Print GNU-style diffs of changed files and properties.\n"),
> {'r', 't', svnlook__no_diff_deleted, svnlook__no_diff_added,
> - svnlook__diff_copy_from} },
> + svnlook__no_diff, svnlook__diff_copy_from} },
>
> {"dirs-changed", subcommand_dirschanged, {0},
> N_("usage: svnlook dirs-changed REPOS_PATH\n\n"
> @@ -238,6 +242,7 @@
> svn_boolean_t help; /* --help */
> svn_boolean_t no_diff_deleted; /* --no-diff-deleted */
> svn_boolean_t no_diff_added; /* --no-diff-added */
> + const char *no_diff; /* --no-diff */
> svn_boolean_t diff_copy_from; /* --diff-copy-from */
> svn_boolean_t verbose; /* --verbose */
> svn_boolean_t revprop; /* --revprop */
> @@ -254,6 +259,7 @@
> svn_boolean_t show_ids;
> svn_boolean_t no_diff_deleted;
> svn_boolean_t no_diff_added;
> + const char *no_diff;
> svn_boolean_t diff_copy_from;
> svn_boolean_t full_paths;
> svn_boolean_t copy_info;
> @@ -780,6 +786,7 @@
> const char *path /* UTF-8! */,
> const char *base_path /* UTF-8! */,
> const svnlook_ctxt_t *c,
> + apr_array_header_t *no_diff_patterns,
> const char *tmpdir,
> apr_pool_t *pool)
> {
> @@ -892,10 +899,17 @@
>
> if (do_diff)
> {
> + int ignore_me = FALSE;
> +
> + if (no_diff_patterns != NULL)
> + ignore_me = svn_cstring_match_glob_list (path, no_diff_patterns);
> +
> SVN_ERR(svn_cmdline_printf(pool, "%s\n", equal_string));
> SVN_ERR(svn_cmdline_fflush(stdout));
>
> - if (binary)
> + if (ignore_me)
> + SVN_ERR (svn_cmdline_printf (pool, _("(Skipping this file type)\n")));
> + else if (binary)
> SVN_ERR(svn_cmdline_printf(pool, _("(Binary files differ)\n")));
> else
> {
> @@ -964,7 +978,7 @@
> SVN_ERR(print_diff_tree(root, base_root, node,
> svn_path_join(path, node->name, subpool),
> svn_path_join(base_path, node->name, subpool),
> - c, tmpdir, subpool));
> + c, no_diff_patterns, tmpdir, subpool));
> while (node->sibling)
> {
> svn_pool_clear(subpool);
> @@ -972,7 +986,7 @@
> SVN_ERR(print_diff_tree(root, base_root, node,
> svn_path_join(path, node->name, subpool),
> svn_path_join(base_path, node->name, subpool),
> - c, tmpdir, subpool));
> + c, no_diff_patterns, tmpdir, subpool));
> }
> apr_pool_destroy(subpool);
>
> @@ -1300,12 +1314,17 @@
> if (tree)
> {
> const char *tmpdir;
> + apr_array_header_t *no_diff_patterns = NULL;
> +
> + /* parse the --no-diff patterns */
> + if (c->no_diff != NULL)
> + no_diff_patterns = svn_cstring_split ( c->no_diff, " ", TRUE, pool);
>
> SVN_ERR(svn_fs_revision_root(&base_root, c->fs, base_rev_id, pool));
> SVN_ERR(svn_io_temp_dir(&tmpdir, pool));
>
> SVN_ERR(print_diff_tree(root, base_root, tree, "", "",
> - c, tmpdir, pool));
> + c, no_diff_patterns, tmpdir, pool));
> }
> return SVN_NO_ERROR;
> }
> @@ -1547,6 +1566,7 @@
> baton->show_ids = opt_state->show_ids;
> baton->no_diff_deleted = opt_state->no_diff_deleted;
> baton->no_diff_added = opt_state->no_diff_added;
> + baton->no_diff = opt_state->no_diff;
> baton->diff_copy_from = opt_state->diff_copy_from;
> baton->full_paths = opt_state->full_paths;
> baton->copy_info = opt_state->copy_info;
> @@ -1980,6 +2000,10 @@
> opt_state.no_diff_added = TRUE;
> break;
>
> + case svnlook__no_diff:
> + opt_state.no_diff = opt_arg;
> + break;
> +
> case svnlook__diff_copy_from:
> opt_state.diff_copy_from = TRUE;
> break;
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Apr 7 21:23:14 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.