On Wed, Apr 25, 2001 at 10:15:36AM -0500, Karl Fogel wrote:
> Ben Collins-Sussman <firstname.lastname@example.org> writes:
> > I think the answer is: for the *immediate* future, we're just going
> > to load file contents into a python string and use "==" to test.
> > (We're only testing on the "greek tree", where file contents are
> > single strings anyway.)
> > In the medium-term, we can switch to md5. This will let us test
> > binary files and overly large files.
> I understand the need to use md5 when we test large files. Let's not
> rush it right now though. Remember, there will be a convenience
> trade-off, because adding tests that test content would then require
> the test author to generate an md5 sum instead of just writing (or
> pasting) the expected content from somewhere.
Python includes the SHA-1 hashing algorithm (my fault :-) in all
distributions since 1.5.2. That ups the bits from 128 to 160, reducing the
(super miniscule) chance of a false positive.
Note that generating hashes isn't that difficult. We do have an interactive
prompt, remember :-)
>>> import sha
>>> h = sha.new(open('expected-file').read())
Not too hard... although... writing that made me think of something. An
md5/sha hash on a *text* file can be different from one platform to the
next. As long as we make some consideration, we should be okay. In the above
example, the file is opened in the default mode, which is text. Thus, all
newlines will be translated to '\n' upon reading, which makes the hash stuff
work right. Of course, if you feed it a binary file, it will do Bad Things
with it, but that actually doesn't matter for our purposes -- we're just
hashing it, we don't need to avoid newline translation.
Ah well. Just my two bits.
Greg Stein, http://www.lyra.org/
Received on Sat Oct 21 14:36:29 2006