[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PATCH] "svnlook diff" option to filter files revisited

From: Pete Gonzalez <pgonzalez_at_bluel.com>
Date: 2007-04-07 20:14:12 CEST

It's been a year and svnlook still has "almost no options at all", so I am
posting an update of my --no-diff patch for release 1.4.3.

Below are some relevant excerpts from the previous discussion. The redux seemed
to be that people wanted me to either (1) use something other than svnlook to
create diffs or (2) invent lots of other similar switches to provide a
generalized context for --no-diff. After having had a year to reflect on this,
I still feel that the original proposal is reasonable and worth revisiting.

Cheers,
-Pete

_______________________________________________________________________________
From: Pete Gonzalez <pgonzalez_at_bluel.com>
Date: 2005-11-18 04:25:02 CET

The attached patch is for this file:

      subversion/subversion/svnlook/main.c

It adds a new command-line option "--no-diff", which is used to prevent
svnlook from displaying differences for certain files. (The effect is
similar the way binary files are excluded.) For example:

      svnlook diff my_repository -r 1234 --no-diff "*.resx *Designer.cs"

In this example, if revision #1234 affected any filenames such as
"MyForm.resx" or "MyForm.Designer.cs", they will be excluded from the diff
(although they will still be listed as changed).

I created this option because our Subversion log e-mails were becoming
cluttered by a lot of verbose diffs for text files that are normally edited
using automated tools. For example, in C# a .resx file might contain
thousands of lines of BASE64-encoded image data.

Cheers,
-Pete

Index: main.c

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

_______________________________________________________________________________
From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2005-11-18 14:30:00 CET

Pete Gonzalez wrote:
> Ben Collins-Sussman wrote:
>
>> On the other hand, there are much, much simpler ways to accomplish
>> this. Just attach an svn:mime-type property to all your .resx files
>> which is something non-texty (i.e. not text/*). The diff engine will
>> automatically skip over any file that the client thinks isn't text.
>
> The actual subset is more complicated than just "*.resx", and since this
> repository contains thousands of files, I think it would be impractical
> to expect people to manually set these properties for individual files.

You (one user or administrator) can set all the existing ones at once, and then
use "auto-props" to make sure that new files get the property as well. (Read
about "auto-props" in the book or ask on the "users@" email list.)

> It makes more sense to have a global wildcard that can be updated from
> time to time when new problems appear.

I can certainly see that that would do what you want today, nicely.

The thing is, it doesn't seem like a general enough solution to include in the
product. Tomorrow you will find (or the next person who tries to use it will
find) that a filename wildcard isn't enough, and it needs to distinguish
between files in different directories, or between short (hand-written) files
and long (machine-generated) ones of the same type, or ...

Hmm... Talking of short and long, have you looked at the commit mailer hook
scripts available? At least one of them has support for suppressing diffs that
are longer than a certain length. This was discussed recently on the "dev@"
list, as somebody was wanting to put a similar feature into one of the other
mailer scripts. I'm not sure whether that was per diffed file or for the whole
diff output, but such a script might be the best place in which to implement
such a feature, not needing to be compiled in to Subversion itself.

We are grateful to you for wanting to contribute your patch to the software.
To explain why the emphasis of these replies is on using the existing features:
before we can incorporate any new feature we need to discuss the design
rationale for the feature: why it does what it does, why it doesn't do other
things, etc., to be sure that it is a logically complete feature that fits in
well with the rest of the system.

If, after trying the properties approach and/or the mailer script approach, you
feel it doesn't solve your problem well, then please discuss it here so we can
determine whether some enhancement or bug fix to the existing features would
help or whether you still need something more like this patch provides. Of
course if there is a clear need for something like this then we (including you)
can go ahead with designing and implementing it.

> Also, IIRC the "svn:mime-type"
> property affects all diffs, but in most interactive contexts people
> would want to see differences.

Do you really believe that people would normally want to see the differences in
those "*.resx" files that you mentioned when they use "svn diff", but would not
want to see them in the commit email? That sounds implausible to me. Maybe
there are a few files for which they might occasionally want to do so, but
surely not most files most often.

(I will admit that there is a design flaw in our "external diff" design whereby
it is currenly impossible to get any diff of a file that has a MIME type
property that Subversion thinks is "binary". Fixing that would overcome this
particular point.)

> The idea of using properties makes sense, but maybe if a directory
> property recursively applied to its contents. Can you think of any
> other simpler solutions?

That might be handy, depending on what it did exactly, but the "auto-props"
feature should work well enough for the time being. Let us know right away if
it doesn't.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

_______________________________________________________________________________
From: Pete Gonzalez <pgonzalez_at_bluel.com>
Date: 2005-11-18 20:39:48 CET

Julian Foad wrote:
>> It makes more sense to have a global wildcard that can be updated from
>> time to time when new problems appear.
>
> I can certainly see that that would do what you want today, nicely.
>
> The thing is, it doesn't seem like a general enough solution to
> include in the product. Tomorrow you will find (or the next
> person who tries to use it will find) that a filename wildcard
> isn't enough, and it needs to distinguish between files in
> different directories, or between short (hand-written)
> files and long (machine-generated) ones of the same type, or ...

I agree that there are probably more flexible interfaces than
my special-purpose filter. However, couldn't a similar argument
be made regarding "--no-diff-added" and "--no-diff-deleted"?
So far I haven't seen any clear design goals for svnlook other
than "almost no options at all". For example, try typing
"man diff" some time. :-)

Internally, svnlook's "diff" subcommand performs a number of
steps such as:

- Build a list of all changed paths, classified as "added",
   "modified", or "deleted"
- Determine which ones are binary files
- Discard some of them according to "--no-diff-deleted" etc
- Detect property changes
- Collect everything into a formatted report, executing the Unix
   diff command on each changed file

Trying to duplicate that functionality in a bash script (e.g. if
the "diff" subcommand didn't exist) would be quite complex. That's
why I think the svnlook algorithm should be more configurable.

Regarding the particular interface, I don't have strong opinions as
long as it's simple to use.

> Hmm... Talking of short and long, have you looked at the commit
> mailer hook scripts available? At least one of them has support
> for suppressing diffs that are longer than a certain length. This
> was discussed recently on the "dev@" list, as somebody was wanting
> to put a similar feature into one of the other mailer scripts.

I would like to see such a script if it really exists. I originally
tried to solve this problem in the hook script (which is arguably
where hacky heuristics belong). As far as I can tell, the only
way would be to generate the full diff output and then parse
it (e.g. with a Perl script). I chose to modify svnlook because I
thought it would be less typing and more clean.

> Do you really believe that people would normally want to see the
> differences in those "*.resx" files that you mentioned when
> they use "svn diff", but would not want to see them in the
> commit email? That sounds implausible to me.

During development, it's very important to be able to compare exactly
what has changed in every single text file. Before committing my
work, I always review every single diff of every single text file.
You can catch a lot of errors this way, as well as a lot of spurious
edits that can be safely removed. With .resx files it's particularly
important because spurious edits are responsible for a lot of
spurious Subversion conflicts. (Also, I must confess that I
use TortoiseSVN, so the extra clutter is trivial to navigate.)

By contrast, in the hook e-mail you just want to see what people
are really working on. Huge diff blocks for machine generated
files make these e-mails nearly useless. It's a big enough
nuisance that I invested time trying to make the e-mails as
concise as possible. I say all this as an argument that
svn:mime-type affects too many other systems, when all I wanted
to do was customize a simple e-mail.

More to the point: Subversion is a tool for managing file diffs.
So why is it strange to expect lots of options for customizing
the report of what files were changed?

Cheers,
-Pete

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

_______________________________________________________________________________
From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2005-11-19 01:31:13 CET

Pete Gonzalez wrote:
> Julian Foad wrote:
>
>>>It makes more sense to have a global wildcard that can be updated from
>>>time to time when new problems appear.
>>
>>I can certainly see that that would do what you want today, nicely.
>>
>>The thing is, it doesn't seem like a general enough solution to
>>include in the product. Tomorrow you will find (or the next
>>person who tries to use it will find) that a filename wildcard
>>isn't enough, and it needs to distinguish between files in
>>different directories, or between short (hand-written)
>>files and long (machine-generated) ones of the same type, or ...
>
>
> I agree that there are probably more flexible interfaces than
> my special-purpose filter. However, couldn't a similar argument
> be made regarding "--no-diff-added" and "--no-diff-deleted"?

Yes. There's a lot of truth in what you say in this email.

> So far I haven't seen any clear design goals for svnlook other
> than "almost no options at all". For example, try typing
> "man diff" some time. :-)
>
> Internally, svnlook's "diff" subcommand performs a number of
> steps such as:
>
> - Build a list of all changed paths, classified as "added",
> "modified", or "deleted"
> - Determine which ones are binary files
> - Discard some of them according to "--no-diff-deleted" etc
> - Detect property changes
> - Collect everything into a formatted report, executing the Unix
> diff command on each changed file
>
> Trying to duplicate that functionality in a bash script (e.g. if
> the "diff" subcommand didn't exist) would be quite complex. That's
> why I think the svnlook algorithm should be more configurable.
>
> Regarding the particular interface, I don't have strong opinions as
> long as it's simple to use.
>
>
>>Hmm... Talking of short and long, have you looked at the commit
>>mailer hook scripts available? At least one of them has support
>>for suppressing diffs that are longer than a certain length. This
>>was discussed recently on the "dev@" list, as somebody was wanting
>>to put a similar feature into one of the other mailer scripts.
>
> I would like to see such a script if it really exists. I originally
> tried to solve this problem in the hook script (which is arguably
> where hacky heuristics belong). As far as I can tell, the only
> way would be to generate the full diff output and then parse
> it (e.g. with a Perl script). I chose to modify svnlook because I
> thought it would be less typing and more clean.

One of the mailers that doesn't yet do it is
tools/hook-scripts/mailer/mailer.py. On about 3rd November, Alan Barrett
posted a patch to add a "truncate_diff_lines" option to it, and I believe other
mailers that support a similar thing already were mentioned by respondents to
the thread.

>>Do you really believe that people would normally want to see the
>>differences in those "*.resx" files that you mentioned when
>>they use "svn diff", but would not want to see them in the
>>commit email? That sounds implausible to me.
>
> During development, it's very important to be able to compare exactly
> what has changed in every single text file. Before committing my
> work, I always review every single diff of every single text file.

Well, all right. (I suppose I don't know what it's like working with such files.)

[...]
> By contrast, in the hook e-mail you just want to see what people
> are really working on. Huge diff blocks for machine generated
> files make these e-mails nearly useless. It's a big enough
> nuisance that I invested time trying to make the e-mails as
> concise as possible. I say all this as an argument that
> svn:mime-type affects too many other systems, when all I wanted
> to do was customize a simple e-mail.
>
> More to the point: Subversion is a tool for managing file diffs.
> So why is it strange to expect lots of options for customizing
> the report of what files were changed?

Again, you're on to a reasonable argument. Perhaps you would like to make a
more general post with a proposal to get the community to recognise this, and
come up with some sort of plan for the features and options that should be
desired? People don't like adding another little option that doesn't fit into
a plan, but if there's a plan then it's a completely different story.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

_______________________________________________________________________________
From: Pete Gonzalez <pgonzalez_at_bluel.com>
Date: 2005-11-23 19:45:12 CET

Julian Foad wrote:
>> More to the point: Subversion is a tool for managing file diffs.
>> So why is it strange to expect lots of options for customizing
>> the report of what files were changed?
>
> Again, you're on to a reasonable argument. Perhaps you would like to
> make a more general post with a proposal to get the community to
> recognise this, and come up with some sort of plan for the features and
> options that should be desired? People don't like adding another little
> option that doesn't fit into a plan, but if there's a plan then it's a
> completely different story.

Fair enough. I will think about this and see if I have any ideas for a
more generalized proposal. The diff subcommand's basic steps are (1)
filtering the file list, (2) fetching files and passing them to GNU diff,
and (3) concatenating the results into a report. Various customizations
could occur at each stage:

step 1: exclude certain files (e.g. with a specific suffix, property, or
revision action) before they are fetched

step 2: configure the diff command (e.g. to ignore whitespace or to execute
a different utility)

step 3: prepend a report header with the file list and log entry (as
commit-email.pl does) or apply an output template for e.g. HTML or PDF.

Maybe these changes would enable commit-email.pl to be simplified as well.
   If anyone else has further ideas for interesting customizations, please
post them here -- I think the interface should be optimized around actual
customization scenarios.

Cheers,
-Pete

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

--- subversion/svnlook/main.c.orig 2007-04-07 13:22:32.589579944 -0400
+++ subversion/svnlook/main.c 2007-04-07 13:41:04.750505784 -0400
@@ -75,6 +75,7 @@
     svnlook__show_ids,
     svnlook__no_diff_deleted,
     svnlook__no_diff_added,
+ svnlook__no_diff,
     svnlook__diff_copy_from,
     svnlook__revprop_opt,
     svnlook__full_paths,
@@ -113,6 +114,9 @@
   {"no-diff-added", svnlook__no_diff_added, 0,
    N_("do not print differences for added files")},
 
+ {"no-diff", svnlook__no_diff, 1,
+ N_("do not print differences for files matching ARG")},
+
   {"diff-copy-from", svnlook__diff_copy_from, 0,
    N_("print differences against the copy source")},
 
@@ -158,7 +162,7 @@
    N_("usage: svnlook diff REPOS_PATH\n\n"
       "Print GNU-style diffs of changed files and properties.\n"),
    {'r', 't', svnlook__no_diff_deleted, svnlook__no_diff_added,
- svnlook__diff_copy_from} },
+ svnlook__no_diff, svnlook__diff_copy_from} },
 
   {"dirs-changed", subcommand_dirschanged, {0},
    N_("usage: svnlook dirs-changed REPOS_PATH\n\n"
@@ -238,6 +242,7 @@
   svn_boolean_t help; /* --help */
   svn_boolean_t no_diff_deleted; /* --no-diff-deleted */
   svn_boolean_t no_diff_added; /* --no-diff-added */
+ const char *no_diff; /* --no-diff */
   svn_boolean_t diff_copy_from; /* --diff-copy-from */
   svn_boolean_t verbose; /* --verbose */
   svn_boolean_t revprop; /* --revprop */
@@ -254,6 +259,7 @@
   svn_boolean_t show_ids;
   svn_boolean_t no_diff_deleted;
   svn_boolean_t no_diff_added;
+ const char *no_diff;
   svn_boolean_t diff_copy_from;
   svn_boolean_t full_paths;
   svn_boolean_t copy_info;
@@ -780,6 +786,7 @@
                 const char *path /* UTF-8! */,
                 const char *base_path /* UTF-8! */,
                 const svnlook_ctxt_t *c,
+ apr_array_header_t *no_diff_patterns,
                 const char *tmpdir,
                 apr_pool_t *pool)
 {
@@ -892,10 +899,17 @@
 
   if (do_diff)
     {
+ int ignore_me = FALSE;
+
+ if (no_diff_patterns != NULL)
+ ignore_me = svn_cstring_match_glob_list (path, no_diff_patterns);
+
       SVN_ERR(svn_cmdline_printf(pool, "%s\n", equal_string));
       SVN_ERR(svn_cmdline_fflush(stdout));
 
- if (binary)
+ if (ignore_me)
+ SVN_ERR (svn_cmdline_printf (pool, _("(Skipping this file type)\n")));
+ else if (binary)
         SVN_ERR(svn_cmdline_printf(pool, _("(Binary files differ)\n")));
       else
         {
@@ -964,7 +978,7 @@
   SVN_ERR(print_diff_tree(root, base_root, node,
                           svn_path_join(path, node->name, subpool),
                           svn_path_join(base_path, node->name, subpool),
- c, tmpdir, subpool));
+ c, no_diff_patterns, tmpdir, subpool));
   while (node->sibling)
     {
       svn_pool_clear(subpool);
@@ -972,7 +986,7 @@
       SVN_ERR(print_diff_tree(root, base_root, node,
                               svn_path_join(path, node->name, subpool),
                               svn_path_join(base_path, node->name, subpool),
- c, tmpdir, subpool));
+ c, no_diff_patterns, tmpdir, subpool));
     }
   apr_pool_destroy(subpool);
 
@@ -1300,12 +1314,17 @@
   if (tree)
     {
       const char *tmpdir;
+ apr_array_header_t *no_diff_patterns = NULL;
+
+ /* parse the --no-diff patterns */
+ if (c->no_diff != NULL)
+ no_diff_patterns = svn_cstring_split ( c->no_diff, " ", TRUE, pool);
 
       SVN_ERR(svn_fs_revision_root(&base_root, c->fs, base_rev_id, pool));
       SVN_ERR(svn_io_temp_dir(&tmpdir, pool));
 
       SVN_ERR(print_diff_tree(root, base_root, tree, "", "",
- c, tmpdir, pool));
+ c, no_diff_patterns, tmpdir, pool));
     }
   return SVN_NO_ERROR;
 }
@@ -1547,6 +1566,7 @@
   baton->show_ids = opt_state->show_ids;
   baton->no_diff_deleted = opt_state->no_diff_deleted;
   baton->no_diff_added = opt_state->no_diff_added;
+ baton->no_diff = opt_state->no_diff;
   baton->diff_copy_from = opt_state->diff_copy_from;
   baton->full_paths = opt_state->full_paths;
   baton->copy_info = opt_state->copy_info;
@@ -1980,6 +2000,10 @@
           opt_state.no_diff_added = TRUE;
           break;
 
+ case svnlook__no_diff:
+ opt_state.no_diff = opt_arg;
+ break;
+
         case svnlook__diff_copy_from:
           opt_state.diff_copy_from = TRUE;
           break;

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Apr 7 20:17:10 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.