[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] size and force text files to be considered binary in mailer.py

From: Jon Bendtsen <jon.bendtsen_at_laerdal.dk>
Date: 2005-08-24 14:56:13 CEST

Thank you for your input

Den 23. aug 2005 kl. 16:55 skrev C. Michael Pilato:

> Jon Bendtsen <jon.bendtsen@laerdal.dk> writes:
>
>
>> The mailsize is calculated by having a hardcoded max size limit in
>> the mailer.py script.
>>
>
> This should be a runtime configuration parameter, not hardcoded.

I know that, but it was easier just to hardcode it at first, and
leave it
at that if noone wanted my patch.

>> For every diff
>> for every line
>> i take the length of the line and subtract it from the
>> max size.
>> If the max size reaches 0 or below, one more line is added
>> saying that the limit has been reached and no more text is
>> included in the email.
>>
>
> Why not go ahead and subtract the cost in size of your "This mail is
> too big" message from the limit? That way a mail not supposed to
> exceed some value doesn't exceed that value.

Because there are some text above the diff messages.
There are headers
There are the list of changed files
There are the log message.

This takes space too, and it was easier just to count the diff lines.
I guess i could try to subtract the above datasizes, but i think it
is going
to be HARD to ensure that the mail does not exceed the limit.

>> Further more the number of diffs are counted and if there is more
>> than 128 diffs/files, it aborts futher generation as well, but does
>> send the mail that was genereated so far.
>>
>
> That kind of arbitrary action is nonsense. Sorry, but who cares how
> many diff sections are in a file? I guarantee that having more than
> 128 has never caused anybody a lick of technical trouble.

Well, my customer asked me to make this limit, so i made it.

>> To force files to be considered binary, the name of the file is
>> checked against a list of regular expressions. The regular
>> expressions is taken from the file filediff.conf that usually
>> resides in the conf directory right next to the mailer.conf. Every
>> line except those beginning with # are used in a regular expression
>> match. If any match is found we stop checking for other regular
>> expressions, and stops generating diffs for this file. A message is
>> included that it is forced binary because of $patern
>>
>
> So, why is a file that lists regular expressions of binary filenames
> called "filediff.conf"? That's not intuitive at all. And might not
> different repositories have different ideas about binariness? In
> fact, as you'll read below, I have doubts about the need for this
> feature at all.

Well again my customer asked for it. Currently they exclude 4 kinds of
files, .po is one of them.
I guess i could change the name to exclude_files_from_diffmails.conf
or something like that, unless you have a better name.

>> I'm looking for input, suggestions, ... and eventually maybe someone
>> will include it into the mailer.py which exists now.
>>
>
> As it stands, I'm unfortunately very much agaist rolling this code (in
> particular) and several of this concepts (in general) into mailer.py.
> Some code review follows.
>
>
>> --- mailer-120.py 2005-08-18 12:32:15.000000000 +0200
>> +++ mailer-size_and_filediff-120.py 2005-08-19
>> 11:47:50.000000000 +0200
>> @@ -667,6 +667,28 @@
>> if self.paths.has_key(path) != self.in_paths:
>> continue
>>
>> + #Added ny JonB 20050818
>>
>
> We don't do in-code attribution.

Allright, i'll remove it

>> + patern_matching = None
>> + reason = ''
>> +
>> + filediff_conf = open(diff_config_fname, 'r')
>> + diffpatern = filediff_conf.readline()
>> + while diffpatern != '':
>> + fixed_diff_patern = diffpatern.strip()
>>
>
> In 'patern' and 'diffpatern' and 'fixed_diff_patern', the common
> English spelling is 'pattern' (with two T's).

i'll correct that then

>> + comment_matching = re.search('^#',fixed_diff_patern)
>> + if comment_matching != None:
>>
>
> Using re.search here is quite unnecessary.
>
> if fixed_diff_patern[0] == '#':
> ...
>
> Or even:
>
> if fixed_diff_patern.find('#') == 0:
> ...

heh, thats certainly smarter than my waste of resources.

>> + if patern_matching != None:
>> + reason = 'FORCED Binary file because of patern: %s\n' %
>> fixed_diff_patern
>>
>
> Users aren't going to care which pattern caused their file not to show
> up, so I think alot of this code could be greatly simplified by losing
> the "reason" thing altogether. So all this could would need to do is
> override the answer returned by diff.either_binary() farther down.
> Basically, you give Subversion an chance to say the file is binary.
> If Subversion says, "Nope, not binary" then you see if the filename
> matches your pattern, and if so, override that binariness answer.
> From there the code flows the same as it does today, and you get the
> universal "Binary file. No diff available." notification in the mail.

I wrote a reason to make it easier for my customer to figure out why
the file was excluded.
I check the filename before letting subversion generate a diff, because
i believe that subversion uses more resources to create those diff's
than would be spent checking the filenames.

> But here's the deal -- why isn't Subversion's is-binary question not
> enough for you? You are aware of svn:mime-type and how it's used to
> determine binariness on a per-file basis, right?

no, i am not. And neither was my customer i guess.

>> @@ -1163,6 +1243,7 @@
>> }
>>
>> config_fname = None
>> + diff_config_fname = None
>> argc = len(sys.argv)
>> if argc < 3:
>> usage()
>> @@ -1192,6 +1273,16 @@
>> raise MissingConfig(config_fname)
>> config_fp = open(config_fname)
>>
>> + # Settle on a filediff file location, and open it.
>> + if diff_config_fname is None:
>> + # Default to REPOS-DIR/conf/filediff.conf.
>> + diff_config_fname = os.path.join(repos_dir, 'conf',
>> 'filediff.conf')
>> + if not os.path.exists(diff_config_fname):
>> + # Okay. Look for 'filediff.conf' as a sibling of this script.
>> + diff_config_fname = os.path.join(os.path.dirname(sys.argv
>> [0]), 'filediff.conf')
>> + if not os.path.exists(diff_config_fname):
>> + raise MissingConfig(diff_config_fname)
>> +
>> svn.core.run_app(main, cmd, config_fname, repos_dir,
>> sys.argv[3:3+expected_args])
>>
>
> You don't seem to actually pass diff_config_fname off anywhere, and it
> isn't declared in a global scope. Also, raising an exception when you
> don't find it means that current users of mailer.py can't upgrade to a
> new version without creating this new config file, even if that file
> would be completely empty.

i guess i have to correct that as well.

JonB

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Aug 24 14:59:58 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.