[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Atom 1.0 feeds for subversion

From: Max Bowsher <maxb1_at_ukf.net>
Date: 2006-07-18 12:34:39 CEST

Bhuvaneswaran Arumugam wrote:
> Based on the discussion [1] i have written a new python script to
> generate Atom 1.0 feeds for commit information. It uses xml dom library
> to generate the feeds.
>
> [1] http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=116571
>
> Please find the attached patch.

Thanks! Here's a skim-through review.

> TODO: Implement a scheme to insert the items to the file if it already
> exists (using pickle?) and make use of --max-items option.

If the file *only* contains Subversion commit entries, then all the data
required to reproduce old items is in the Subversion repository, so
there should be no need to pickle, or re-read the file.

> [[
> * contrib/hook-scripts/svn2atom.py
> New script to generate Atom 1.0 feeds for commit information. It uses
> xml dom library to generate the feeds

'Python xml.dom module' explains what you mean more clearly.

Also, missing '.' at end of sentences.

> * www/tools_contrib.html
> Include the documentation for contrib/hook-scripts/svn2atom.py script
> ]]
>
> Index: contrib/hook-scripts/svn2atom.py
> ===================================================================
> --- contrib/hook-scripts/svn2atom.py (revision 0)
> +++ contrib/hook-scripts/svn2atom.py (revision 0)
> @@ -0,0 +1,284 @@
> +#!/usr/bin/env python
> +
> +"""Usage: svn2atom.py [OPTION...] REPOS-PATH
> +
> +Generate an ATOM 1.0 feed file containing commit information for the

ATOM or Atom? Find out which is officially correct, and use it exclusively.

> +Subversion repository located at REPOS-PATH. The item title is the
> +revision number, and the item description contains the author, date,
> +log messages and changed paths.
> +
> +Options:
> +
> + -h, --help Show this help message.
> +
> + -f, --atom-file=PATH Store the ATOM feed in the file located at PATH,
> + which will be created if it doesn't already exist.
> + If not provided, the script will store the feed in
> + the current working directory, in a file named
> + REPOS_NAME.atom (where REPOS_NAME is the basename
> + of the REPOS_PATH command-line argument).

Explain what happens if the file already exists. It will be overwritten?

> + -r, --revision=X[:Y] Subversion revision (or revision range) to generate
> + ATOM info for. If not provided, info for the single
> + youngest revision in the repository will be generated.
> +
> + -m, --max-items=N Keep only N items in the ATOM feed file. By default,
> + 20 items are kept.

If keeping of old items isn't implemented yet, then the above comment is
 wrong.

If you don't want to implement keeping of old items in the initial
version, that is fine, but the documentation must reflect that. To put
it another way: missing features are acceptable. Bugs are not.

> + -u, --item-url=URL Use URL as the basis for generating ATOM item links.

It's unclear what you are supposed to use this for without reading the code.

> + -U, --feed-url=URL Use URL as the global ATOM feed link.

Ditto.

> + -P, --svn-path=DIR Look in DIR for the svnlook binary. If not provided,
> + the script will run "svnlook" via a typical $PATH hunt.

Say: "svnlook" must be on the PATH.
Not: the script will run "svnlook" via a typical $PATH hunt.

because the old version implies that the script is hunting though PATH
itself.

> +"""
> +
> +import sys
> +
> +# All clear on the custom module checks. Import some standard stuff.

Wrong comment. There were no custom module checks. Also "Import some
standard stuff." is entirely self evident. Delete the comment.

> +import getopt, os, popen2, datetime

Please use PEP 8 style <http://www.python.org/dev/peps/pep-0008/>. In
this case, that means each of the above imports gets a separate line.

> +from StringIO import StringIO
> +from xml.dom.DOMImplementation import implementation
> +import xml.utils
> +
> +def usage_and_exit(errmsg=None):
> + """Print a usage message, plus an ERRMSG (if provided), then exit.
> + If ERRMSG is provided, the usage message is printed to stderr and
> + the script exits with a non-zero error code. Otherwise, the usage
> + message goes to stdout, and the script exits with a zero
> + errorcode."""

> + stream = errmsg is not None and sys.stderr or sys.stdout

This feels too unreadable for me, I'd prefer an explicit "if" statement.

> + print >> stream, __doc__
> + if errmsg:
> + print >> stream, "\nError: %s" % (errmsg)
> + sys.exit(2)
> + sys.exit(0)
> +
> +def check_url(url, opt):
> + """Verify that URL looks like a valid URL or option OPT."""
> + if not (url.startswith('https://') \
> + or url.startswith('http://') \
> + or url.startswith('file://')):
> + usage_and_exit("svn2atom.py: Invalid url '%s' is specified for " \
> + "'%s' option" % (url, opt))

No. What if someone wants to put a svn:// URL in there? Or an ftp://
one? Or even a relative URL?

I think this check should be removed.

> +class SVN2ATOM:

PEP 8 ==> Svn2Atom

> + def __init__(self, svn_path, repos_path, item_url, atom_file, max_items,
> + feed_url):
> + self.repos_path = repos_path
> + self.item_url = item_url
> + self.atom_file = atom_file
> + self.max_items = max_items
> + self.feed_url = feed_url
> + self.svnlook_cmd = 'svnlook'
> + if svn_path is not None:
> + self.svnlook_cmd = os.path.join(svn_path, 'svnlook')
> +
> + self.init_output()
> + self.add_header()

Why are these not one combined function?

> + def init_output(self):
> + """ initialize atom output """
> + self.document = implementation.createDocument(None,None,None)
> + self.feed = self.document.createElement("feed")
> + self.document.appendChild(self.feed)
> +
> + def add_header(self):
> + """ add atom xml header """
> +
> + doc = self.document
> + self.feed.appendChild(doc.createTextNode("\n "))
> + self.feed.setAttribute("xml:lang", "en")

How can you be certain of this? Better to leave it blank than set it to
a possibly incorrect value.

> + self.feed.setAttribute("xmlns", "http://www.w3.org/2005/Atom")
> + self.feed.setAttribute("xmlns:xh", "http://www.w3.org/1999/xhtml")
> +
> + self.title = self.document.createElement("title")
> + self.feed.appendChild(self.title)
> + self.title.appendChild(doc.createTextNode("%s SVN Commit's Feed" % \

I would prefer 'Subversion' in place of 'SVN'.
Incorrect use of apostrophe. Remove.
Do not use redundant backslashes.

> + (os.path.basename(self.repos_path))))
> +
> + self.id = self.document.createElement("id")
> + self.feed.appendChild(self.id)
> + self.id.appendChild(doc.createTextNode("%s" % self.feed_url or ""))

What's with the 'or ""' ?

> + self.updated = self.document.createElement("updated")
> + self.feed.appendChild(self.updated)
> + self.updated.appendChild(\
> + doc.createTextNode("%s" % self.format_date(datetime.datetime.now())))

If you are going to require Python 2.3, then you must state that at the
top of the script.

> + self.link = self.document.createElement("link")
> + self.feed.appendChild(self.link)
> + self.link.setAttribute("href", self.feed_url or "")
> +
> + self.author = self.document.createElement("author")
> + self.feed.appendChild(self.author)
> + self.aname = self.document.createElement("name")
> + self.author.appendChild(self.aname)
> + self.aname.appendChild(doc.createTextNode("subversion"))

Is it really necessary for all these nodes to be member variables?

> +
> + def add_entry(self, revision):
> + """ add new atom entry for a revision """
> + self.revision = revision
> + url = self.item_url and \
> + self.item_url + "?rev=%d" % self.revision or ""

Hardcoding of ?rev= is not reasonable. Arrange for there to be a marker
in item_url which you replace.

> + doc = self.document
> + self.entry = self.document.createElement("entry")
> + self.feed.appendChild(self.entry)
> + self.entry.appendChild(doc.createTextNode("\n "))
> +
> + self.id = self.document.createElement("id")
> + self.entry.appendChild(self.id)
> + self.id.appendChild(doc.createTextNode("%s" % url))
> +
> + self.title = self.document.createElement("title")
> + self.entry.appendChild(self.title)
> + self.title.appendChild(doc.createTextNode(\
> + "Revision %s" % self.revision))
> +
> + self.updated = self.document.createElement("updated")
> + self.entry.appendChild(self.updated)
> + self.updated.appendChild(\
> + doc.createTextNode("%s" % self.format_date(datetime.datetime.now())))

Now? Shouldn't the time be taken from the revision's svn:date?

> + self.link = self.document.createElement("link")
> + self.entry.appendChild(self.link)
> + self.link.setAttribute("href", url)
> +
> + self.summary = self.document.createElement("summary")
> + self.entry.appendChild(self.summary)
> +
> + description = self.make_atom_item_desc()
> + self.summary.appendChild(doc.createTextNode(description))
> +
> + def finish_output(self):
> + """ write the atom feed to the file """
> + t = self.document.createTextNode("\n")
> + self.feed.appendChild(t)
> + fp = open(self.atom_file, "w")
> + xml.dom.ext.PrettyPrint(self.document, fp)
> + fp.write("\n")
> + fp.close()
> +
> + def make_atom_item_desc(self):
> + cmd = "%s info -r %s %s" % (self.svnlook_cmd, self.revision, self.repos_path)

Use list style commands, as you do later in main(), to avoid shell
metacharacter issues.

> + out, x, y = popen2.popen3(cmd)

x? y? Yuck, use clear variable names.

> + cmd_out = out.readlines()
> + cmd_err = y.readlines()

Hmm... I _think_ this is relying on OS buffering to avoid deadlocking.

> + Author = "Author: " + cmd_out[0] + "\n"

But don't the lines already have a newline attached?

> + Date = "Date: " + cmd_out[1] + "\n"
> + New_Revision = "Revision: %s\n" % self.revision
> + Log = "Log: " + cmd_out[3] + "\n"

Weird capitalization. Local variables should be lower-case.

> + out.close()
> + x.close()
> + y.close()
> +
> + cmd = "%s changed -r %s %s" % (self.svnlook_cmd, self.revision, self.repos_path)
> + out, x, y = popen2.popen3(cmd)
> + cmd_out = out.readlines()
> + changed_files = "Modified: "
> + for item in cmd_out:
> + changed_files = changed_files + item

+=

> + item_desc = Author + Date + New_Revision + Log + changed_files

Clearer would be:
desc_lines = []
desc_lines.append(...)
desc_lines.append(...)
desc_lines.append(...)
desc_lines.append(...)
desc_lines.append('')
item_desc = "\n".join(desc_lines)

> + out.close()
> + x.close()
> + y.close()
> +
> + return item_desc
> +
> + def format_date(self, dt):
> + """ input date must be in GMT """
> + return "%04d-%02d-%02dT%02d:%02d:%02dZ" % \
> + (dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second)
> +
> +def main():
> + # Parse the command-line options and arguments.
> + try:
> + opts, args = getopt.gnu_getopt(sys.argv[1:], "hP:r:u:f:m:U:",
> + ["help",
> + "svn-path=",
> + "revision=",
> + "item-url=",
> + "atom-file=",
> + "max-items=",
> + "feed-url=",
> + ])
> + except getopt.GetoptError, msg:
> + usage_and_exit(msg)
> +
> + # Make sure required arguments are present.
> + if len(args) != 1:
> + usage_and_exit("You must specify a repository path.")
> + repos_path = args[0]
> +
> + # Now deal with the options.
> + max_items = 20
> + commit_rev = None
> + svn_path = item_url = feed_url = None
> + atom_file = os.path.basename(repos_path) + ".atom"

Is this a normal file extension convention for atom files?

> + for opt, arg in opts:
> + if opt in ("-h", "--help"):
> + usage_and_exit()
> + elif opt in ("-P", "--svn-path"):
> + svn_path = arg
> + elif opt in ("-r", "--revision"):
> + commit_rev = arg
> + elif opt in ("-u", "--item-url"):
> + item_url = arg
> + check_url(item_url, opt)
> + elif opt in ("-f", "--atom-file"):
> + atom_file = arg
> + elif opt in ("-m", "--max-items"):
> + try:
> + max_items = int(arg)
> + except ValueError, msg:
> + usage_and_exit("Invalid value '%s' for --max-items." % (arg))
> + if max_items < 1:
> + usage_and_exit("Value for --max-items must be a positive integer.")
> + elif opt in ("-U", "--feed-url"):
> + feed_url = arg
> + check_url(feed_url, opt)
> +
> + if (commit_rev == None):

Un-Pythonic parentheses.

It is conventional to test "is None", not "== None".

> + svnlook_cmd = 'svnlook'
> + if svn_path is not None:
> + svnlook_cmd = os.path.join(svn_path, 'svnlook')
> + out, x, y = popen2.popen3([svnlook_cmd, 'youngest', repos_path])
> + cmd_out = out.readlines()
> + revisions = [int(cmd_out[0])]
> + out.close()
> + x.close()
> + y.close()
> + else:
> + try:
> + rev_range = commit_rev.split(':')
> + except ValueError, msg:
> + usage_and_exit("svn2atom.py: Invalid value '%s' for --revision." \
> + % (commit_rev))

What's this supposed to catch, exactly? I cannot think of any
circumstances in which a ValueError would be raised here.

> + len_rev_range = len(rev_range)
> + if len_rev_range == 1:
> + revisions = [int(commit_rev)]
> + elif len_rev_range == 2:
> + start, end = rev_range
> + start = int(start)
> + end = int(end)
> + if (start > end):
> + tmp = start
> + start = end
> + end = tmp
> + revisions = range(start, end + 1)[-max_items:]

range() creates an explicit list. Nasty things will happen if the user
specifies a large revision range here. Rewrite to not use range(), by
simply using the start and end variables to control a loop variable
directly.

> + else:
> + usage_and_exit("svn2atom.py: Invalid value '%s' for --revision." \
> + % (commit_rev))
> +
> + svn2atom = SVN2ATOM(svn_path, repos_path, item_url, atom_file,
> + max_items, feed_url)
> + for revision in revisions:
> + svn2atom.add_entry(revision)
> + svn2atom.finish_output()
> +
> +if __name__ == "__main__":
> + main()

Received on Tue Jul 18 12:35:31 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.