On 13.06.2018 16:31, Julian Foad wrote:
> Branko Čibej wrote:
>> On 12.06.2018 13:00, Julian Foad wrote:
>>> Branko Čibej wrote:
>>>> https://ci.apache.org/builders/svn-x64-macosx-apr1.3-nothread/builds/3416/steps/Test%20ra_local%2Bfsfs/logs/faillog
>>>>
>>> I should have a chance to debug this on a Mac today or tomorrow.
>> The good news is that I can't reproduce this test failure locally. There
>> are two possible reasons I can think of: [...]
> I have been trying to reproduce and debug this with my colleague Konrad and we have a theory. The failure mode seems consistent with Python writing a change to disk and then running 'svn', and 'svn' not seeing that change on disk.
>
> The test uses this code in svntest.main.file_write():
>
> open(path, mode).write(contents)
>
> When opening and using a file object "in line" in this way, is seems that the timing of the implicit "close" is not specified, according to sources such as:
>
> https://stackoverflow.com/questions/5362901/in-the-inline-open-and-write-file-is-the-close-implicit#5362917
>
> """
> The close() here happens when the file object is deallocated from memory, as part of its deletion logic. Because modern Pythons on other virtual machines — like Java and .NET — cannot control when an object is deallocated from memory, it is no longer considered good Python to open() like this without a close(). The recommendation today is to use a with statement, which explicitly requests a close() when the block is exited ...
> """
The object lifetime and implicit close are very well defined indeed in
CPython, which we use. We don't use "modern Pythons on other virtual
machines." If this were indeed the source of the problem, a number of
other tests would be failing randomly on that bot, and elsewhere too,
I'm sure. But ...
> It seems that we need to change the code to something like:
>
> def file_write(path, contents, mode='w'):
> [...]
> with open(path, mode) as f:
> f.write(contents)
>
> to guarantee the file is closed before the test proceeds further. It may be that most of our testing in the past has used Python implementations where the write is never delayed far enough to affect what an invoked 'svn' subprocess sees, but on the Mac it is a little different. We are using a Mac with Python 2. (The 'file_write' method has separate code paths for Python 2 and Python 3 but both use the same open().write() construct.)
... on the other hand, making this suggested change won't hurt, as long
as our required Python version for tests supports the 'with' statement,
which I believe it does. In fact we already use the 'with' statement in
our test suite.
> The same construct appears in many other places in the test suite too.
>
> We are trying to test this, and it is taking a long time because we can only currently reproduce the failure after a fresh checkout/build/test cycle. Maybe because the bug is sensitive to the differences in timing of a warm cache vs. a cold cache.
It's quite likely a timing problem between cache flush and file open.
For a while I was looking at the shelve code to see if there was a
missing sync(), but that actually doesn't make sense in the context of
the test suite, since 'svn' is invoked as an external process which ends
before the tests continue — only a bug in the OS filesystem cache could
cause the observed behaviour, and that sort of bug would be ... let's
just say rather extremely visible with other applications, too.
So, long story short ... changing all the open().write() statements to
the 'with' form should at least remove one possible source of bugs.
-- Brane
Received on 2018-06-14 04:04:30 CEST