[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] size and force text files to be considered binary in mailer.py

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: 2005-08-23 16:55:08 CEST

Jon Bendtsen <jon.bendtsen@laerdal.dk> writes:

> The mailsize is calculated by having a hardcoded max size limit in
> the mailer.py script.

This should be a runtime configuration parameter, not hardcoded.

> For every diff
> for every line
> i take the length of the line and subtract it from the max size.
> If the max size reaches 0 or below, one more line is added
> saying that the limit has been reached and no more text is
> included in the email.

Why not go ahead and subtract the cost in size of your "This mail is
too big" message from the limit? That way a mail not supposed to
exceed some value doesn't exceed that value.

> Further more the number of diffs are counted and if there is more
> than 128 diffs/files, it aborts futher generation as well, but does
> send the mail that was genereated so far.

That kind of arbitrary action is nonsense. Sorry, but who cares how
many diff sections are in a file? I guarantee that having more than
128 has never caused anybody a lick of technical trouble.

> To force files to be considered binary, the name of the file is
> checked against a list of regular expressions. The regular
> expressions is taken from the file filediff.conf that usually
> resides in the conf directory right next to the mailer.conf. Every
> line except those beginning with # are used in a regular expression
> match. If any match is found we stop checking for other regular
> expressions, and stops generating diffs for this file. A message is
> included that it is forced binary because of $patern

So, why is a file that lists regular expressions of binary filenames
called "filediff.conf"? That's not intuitive at all. And might not
different repositories have different ideas about binariness? In
fact, as you'll read below, I have doubts about the need for this
feature at all.

> I'm looking for input, suggestions, ... and eventually maybe someone
> will include it into the mailer.py which exists now.

As it stands, I'm unfortunately very much agaist rolling this code (in
particular) and several of this concepts (in general) into mailer.py.
Some code review follows.

> --- mailer-120.py 2005-08-18 12:32:15.000000000 +0200
> +++ mailer-size_and_filediff-120.py 2005-08-19 11:47:50.000000000 +0200
> @@ -667,6 +667,28 @@
> if self.paths.has_key(path) != self.in_paths:
> continue
> + #Added ny JonB 20050818

We don't do in-code attribution.

> + patern_matching = None
> + reason = ''
> +
> + filediff_conf = open(diff_config_fname, 'r')
> + diffpatern = filediff_conf.readline()
> + while diffpatern != '':
> + fixed_diff_patern = diffpatern.strip()

In 'patern' and 'diffpatern' and 'fixed_diff_patern', the common
English spelling is 'pattern' (with two T's).

> + comment_matching = re.search('^#',fixed_diff_patern)
> + if comment_matching != None:

Using re.search here is quite unnecessary.

   if fixed_diff_patern[0] == '#':

Or even:

   if fixed_diff_patern.find('#') == 0:

> + if patern_matching != None:
> + reason = 'FORCED Binary file because of patern: %s\n' %
> fixed_diff_patern

Users aren't going to care which pattern caused their file not to show
up, so I think alot of this code could be greatly simplified by losing
the "reason" thing altogether. So all this could would need to do is
override the answer returned by diff.either_binary() farther down.
Basically, you give Subversion an chance to say the file is binary.
If Subversion says, "Nope, not binary" then you see if the filename
matches your pattern, and if so, override that binariness answer.
From there the code flows the same as it does today, and you get the
universal "Binary file. No diff available." notification in the mail.

But here's the deal -- why isn't Subversion's is-binary question not
enough for you? You are aware of svn:mime-type and how it's used to
determine binariness on a per-file basis, right?

> @@ -1163,6 +1243,7 @@
> }
> config_fname = None
> + diff_config_fname = None
> argc = len(sys.argv)
> if argc < 3:
> usage()
> @@ -1192,6 +1273,16 @@
> raise MissingConfig(config_fname)
> config_fp = open(config_fname)
> + # Settle on a filediff file location, and open it.
> + if diff_config_fname is None:
> + # Default to REPOS-DIR/conf/filediff.conf.
> + diff_config_fname = os.path.join(repos_dir, 'conf', 'filediff.conf')
> + if not os.path.exists(diff_config_fname):
> + # Okay. Look for 'filediff.conf' as a sibling of this script.
> + diff_config_fname = os.path.join(os.path.dirname(sys.argv[0]), 'filediff.conf')
> + if not os.path.exists(diff_config_fname):
> + raise MissingConfig(diff_config_fname)
> +
> svn.core.run_app(main, cmd, config_fname, repos_dir,
> sys.argv[3:3+expected_args])

You don't seem to actually pass diff_config_fname off anywhere, and it
isn't declared in a global scope. Also, raising an exception when you
don't find it means that current users of mailer.py can't upgrade to a
new version without creating this new config file, even if that file
would be completely empty.

www.collab.net  <>  CollabNet  |  Distributed Development On Demand
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Aug 23 17:01:01 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.