This summer, as part of the Google Summer of Code initiative, I'm
planning to create a port of the Subversion Command-Line client that
uses the Python/SWIG bindings. Once this new software is complete,
we'll be able to use the standard tests for the regular Subversion
Command-Line Client to identify bugs in the Python bindings.
The current state of the Python tests is glibly described in
tools/test-scripts/svntest-bindings.sh:
"Hey! My friends have got beautiful and shining unit tests,
but I have been left out in the cold. This is soooo unfair!"
Here's what I'm planning to do in the next few weeks:
- Refactor the C command-line client:
* Create a libsvn_cmdline library by extracting functions from
the main command-line client and moving them to the
subversion/libsvn_cmdline directory.
* Create new API header file for the libsvn_cmdline library
based on cl.h. I'm thinking of calling the new header file
svn_cmdline_cl.h.
* Simplify the svn_cmdline_cl.h file so that SWIG can have
an easier time parsing it. This means we'll have to change
functions declared as callbacks into vanilla function
definitions. The standard syntax isn't as compact,
but it's SWIG-compatible.
* Simplify the main function in the command-line client by
splitting it into several smaller functions. This refactoring
will make it easier for me to slowly convert the main function
from C into Python, piece by piece.
- Expand the functionality of the Python/SWIG bindings:
* Create new svn_cmdline module for the new libsvn_cmdline library
* Create new svn_diff module for the libsvn_diff library. This is
mostly a stub for now.
* Create new svn_apr module for the APR library. Create mappings
for any functionality that is used in the command-line client.
Two candidates right now: the apr_getopt.h and apr_allocator.h
header files.
* Create bindings for the libsvn_cmdline library
* Create new typemaps for types such as char **
- Refactor the Python/SWIG bindings:
* Move the FILE* typemap from core.i to a new file called 'file.i'
so that it can be reused in the new svn_cmdline bindings.
* Remove the svn_cmdline_init function from the svn_delta.i module.
This function has been added to the svn_cmdline module instead.
- Convert the C command-line client to Python, piece by piece.
I'll start with the main function. Then I'll move on to the
implementations of each basic Subversion command:
* add, blame, cat, checkout, cleanup, commit, copy, delete,
diff, export, help, import, info, list, lock, log, merge,
mkdir, move, propdel, propedit, propget, proplist, propset,
resolved, revert, status, switch, unlock, update
To help keep the implementation and maintenance of the new
functions simple, the C code for each function will be
directly transliterated into Python. I'll try to use the
exact same API calls and the exact same logic but in Python
syntax.
Along the way, I'll be sure to run into a few issues with
the C code and the SWIG bindings. Some of these issues have
already been identified above.
Questions:
- Is it OK to create the new libsvn_cmdline library? It will make my
command-line SWIG work this summer much easier.
- Is it possible to mark the libsvn_cmdline_cl.h API as experimental
and subject to change? I think that the API for the command-line
client could go through quite a few changes and improvements this
summer and I don't want to have to worry about revving the API yet.
- Is it OK to use malloc / free in the SWIG bindings? In some cases,
this seems to be the only solution because APR pools are not always
available in a typemap.
For those who haven't read my Python proposal, I've attached it below.
Feedback and advice is always welcome.
Project Title: Command-line Bindings for Python
----------------------------------------------------------------------
Synopsis
----------------------------------------------------------------------
Subversion is not just a version-control system. It is also a library.
The Subversion library officially supports five programming languages:
C, Perl, Python, Ruby, and Java. As Subversion is updated to fix bugs
and support new features, it is often difficult to know whether these
changes will cause problems in the various programming language
bindings. This problem of hidden bugs is particularly acute in the
Python/SWIG bindings because they do not have an automated test suite.
To help Subversion developers more quickly identify bugs in the Python
bindings, I will implement a clone of the standard command-line
client using the Python/SWIG bindings. This clone will allow the
existing test suite for the Subversion command-line client to also test
the Python command-line client. If this testing reveals bugs or missing
features in the underlying Python/SWIG bindings, the Subversion
developers will automatically be notified via the svn-breakage mailing
list.
----------------------------------------------------------------------
Benefits
----------------------------------------------------------------------
Benefits for Python Developers:
- Simple and consistent usage: Type the same commands into Python as
you would on the command-line.
- Because the new interface is implemented based on the existing
low-level library, high-level command-line calls and low-level
library calls can be intermixed during a single session.
- The existing high-performance Python/SWIG bindings now have proven
reliability, thanks to the extensive automated test suite for the
command-line client.
Benefits for Subversion Developers:
- Increased adoption of Subversion in the Python community.
- Automatic nightly test suite will notify developers if changes to the
Subversion code break the Python bindings.
- Upgraded Python bindings will be easy to maintain because they build
upon the framework established in the existing SWIG bindings (e.g.
Ruby, Perl, Python)
----------------------------------------------------------------------
Deliverables
----------------------------------------------------------------------
Code:
- 31 functions. Each implements a basic command
* add, blame, cat, checkout, cleanup, commit, copy, delete, diff,
export, help, import, info, list, lock, log, merge, mkdir, move,
propdel, propedit, propget, proplist, propset, resolved, revert,
status, switch, unlock, update
- 32 command-line options
* auto-props, config-dir, diff-cmd, diff3-cmd, dry-run, editor-cmd,
encoding, extensions, file, force, force-log, ignore-ancestry,
ignore-externals, incremental, limit, message, native-eol, new,
no-auth-cache, no-auto-props, no-diff-deleted, no-ignore,
no-unlock, non-interactive, non-recursive, notice-ancestry,
old, password, quiet, recursive, relocate, revision, revprop,
show-updates, stop-on-copy, strict, targets, username, verbose,
version
Testing:
- The test suite for the standard client will be adapted so that it can
test the Python client
----------------------------------------------------------------------
Implementation Plan
----------------------------------------------------------------------
1. Write a simple script which creates a Python command-line parser
based on the svn_cl__options and svn_cl__cmd_table structures from
main.c in subversion/clients/cmdline. The Python standard optparse
module will do most of the work, but we will also need to write
custom code to parse Subversion revision numbers and ranges.
2. Upgrade our script to dispatch each command to the appropriate
command-line client function. This initial prototype will provide
the full functionality of the Subversion client, but will only test
the surface functionality of SWIG.
3. Upgrade the Subversion automated test suite to test our new script
using the command-line test suite. Fix any errors that are found.
4. Replace each command-line C function in the Python command-line
client with an appropriate Python implementation. Each function
should be implemented, tested, and committed as a separate patch. If
adding a new function reveals a bug in the underlying SWIG/Python
library, these bugs should be reported to the Subversion development
list.
----------------------------------------------------------------------
Project Schedule
----------------------------------------------------------------------
This 2 month plan assumes that I will start work on this project
on June 25. The project will be complete by September 1, 2005
1. Planning and Approval (4 days)
- Send the technical details of my plan to Subversion developers
and solicit feedback
- Revise my plan as necessary to meet the needs of the Subversion
developers
- Apply to be a partial committer for the Python bindings
2. Initial Prototype (2 weeks)
- Write a simple script which creates a Python command-line parser
- Upgrade our script to dispatch each command to the appropriate C
function
- Upgrade the Subversion automated test suite to test our new script
4. Implementation, Documentation & Testing (7 weeks)
- Replace each command-line C function in the Python command-line
client with an appropriate Python implementation. Implement 5 or 6
API functions per week.
- Monitor Subversion developer list and fix issues as required
5. Project is complete
----------------------------------------------------------------------
Appendix A: Why stick with the existing Python/SWIG bindings?
----------------------------------------------------------------------
Subversion developers love SWIG because it saves them time. "Sharing
the core of the bindings implementation across languages", writes
Daniel Rall, "is powerful reuse." [1]
Nevertheless, the Python/SWIG bindings are not the only libraries which
offer access to the Subversion library. The PySVN and SvnCpp projects
are both written in C++, and they both offer documented and tested
bindings for the Subversion library. However, both sets of bindings
suffer from the same key problem: they only wrap a small subset of
Subversion's functionality, and they reimplement functionality which is
already available in the existing SWIG bindings for Subversion.
Maintaining the two sets of bindings in parallel would be too large a
task for the Subversion development team.
As a result of this situation, Python developers are forced to make a
difficult choice between ease of use and complete functionality.
According to Max Bowsher, "PySVN wraps much less of the API than the
SWIG-Python bindings do, but does it in a higher level (and documented)
way -- it's all about tradeoffs, really." [2] The situation with SvnCpp
is much the same as with PySVN, except for that SvnCpp does not
directly support Python.
In 2004, Ben Reser announced that, for Python, the "SWIG stuff is
pretty much done. You could write the OO layer entirely in Python" [3]
To a developer who considered reimplementing the bindings in Pyrex, Ben
Reser advised: "I think your time would be better spent working on
writing the OO layer on top of SWIG." [3]
Greg Stein also attests to the quality of the SWIG/Python bindings:
"I've been using the Python Bindings for years. Literally." [4]
In building a command-line client in Python, we will get an extensive
test suite and a rudimentary object-oriented interface to the
SWIG/Python bindings for free. While this interface will initially only
support the basic functionality of the Subversion client, we can in
future extend this interface to support additional functionality.
[1]: http://svn.haxx.se/dev/archive-2004-04/1044.shtml
[2]: http://svn.haxx.se/dev/archive-2005-02/0748.shtml
[3]: http://svn.haxx.se/dev/archive-2004-04/1395.shtml
[4]: http://svn.haxx.se/dev/archive-2004-05/0407.shtml
----------------------------------------------------------------------
Appendix B: Functionality Listing
----------------------------------------------------------------------
All documented commands of the command-line client will be supported.
That is, the following commands will be supported.
* add: Add files, directories, or symbolic links to your working
copy and schedule them for addition to the repository.
* blame: Show author and revision information in-line for the
specified files or URLs
* cat: Output the contents of the specified files or URLs
* checkout: Check out a working copy from a repository
* cleanup: Recursively clean up the working copy
* commit: Send changes from your working copy to the repository
* copy: Copy a file or directory in a working copy or in the
repository
* delete: Delete an item from a working copy or the repository
* diff: Display the differences between two paths
* export: Export a clean directory tree
* help: Describe the usage of this program or its subcommands
* import: Recursively commit a copy of PATH to URL
* info: Print information about PATHs
* list: List directory entries in the repository
* lock: Lock working copies paths or URLs in the repository, so
that no other user can commit changes to them.
* log: Display commit log messages
* merge: Apply the differences between two sources to a working
copy path
* mkdir: Create a new directory under version control
* move: Move a file or directory
* propdel: Remove a property from an item
* propedit: Edit the property of one or more items under version
control
* propget: Print the value of a property
* proplist: List all properties
* propset: Set PROPNAME to PROPVAL on files, directories, or
revisions
* resolved: Remove 'conflicted' state on working copy files or
directories
* revert: Undo all local edits
* status: Print the status of working copy files and directories
* switch: Update working copy to a different URL
* unlock: Unlock working copies paths or URLs
* update: Update your working copy
The following command-line options will be supported:
* auto-props: enable automatic properties
* config-dir: read user configuration files from directory ARG
* diff-cmd: use ARG as diff command
* diff3-cmd: use ARG as merge command
* dry-run: try operation but make no changes
* editor-cmd: use ARG as external editor
* encoding: treat value as being in specified charset
encoding
* extensions: pass ARG to --diff-cmd as options
* file: read data from specified file
* force-log: force validity of log message source
* force: force operation to run
* help: show help on a subcommand
* ignore-ancestry: ignore ancestry when calculating merges
* ignore-externals: ignore externals definitions
* incremental: give output suitable for concatenation
* limit: maximum number of log entries
* message: specify commit message ARG
* native-eol: use a different EOL marker than the standard
system marker for files with a native svn:eol-
style property. ARG may be one of 'LF', 'CR',
'CRLF'
* new: use ARG as the newer target
* no-auth-cache: do not cache authentication tokens
* no-auto-props: disable automatic properties
* no-diff-deleted: do not print differences for deleted files
* no-ignore: disregard default and svn:ignore property ignores
* no-unlock: don't unlock the targets
* non-interactive: do no interactive prompting
* non-recursive: operate on single directory only,
* notice-ancestry: notice ancestry when calculating differences
* old: use ARG as the older target
* password: specify a password
* quiet: print as little as possible,
* recursive: descend recursively
* relocate: relocate via URL-rewriting
* revision: a revision or a range of revisions
* revprop: operate on a revision property (use with -r)
* show-updates: display update information
* stop-on-copy: do not cross copies while traversing history
* strict: use strict semantics
* targets: pass contents of file ARG as additional args
* username: specify a username
* verbose: print extra information
* version: print client version info
* xml: output in XML
----------------------------------------------------------------------
Appendix C: Biography
----------------------------------------------------------------------
David James is an undergraduate Computer Science student at the
University of Toronto. In Fall 2004, David helped write Subversion
bindings for a Java-based academic groupware system. Since then, David
has been a regular contributor to the Subversion project, submitting
17 patches which were reviewed and accepted by Subversion developers.
David's contributions have made the Java and Ruby bindings easier to
compile, test, and install. Recently, David added Ruby support to the
automated nightly test-suite, so that Subversion developers can be
notified by email whenever a Ruby test fails. David is a partial
committer for the Ruby bindings.
At the University of Toronto, David has researched improved statistical
models for understanding natural language, earning three research
awards and a teaching assistantship in the process. As part of his
teaching assistantship, David taught Python to Computational
Linguistics graduate students.
For more information on David James, please see my resume:
http://www.cs.toronto.edu/~james/David_James_Resume_2005.html
--
David James -- http://www.cs.toronto.edu/~james
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jun 29 04:34:27 2005