[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 1.1.4 Win32 test results

From: Kevin Puetz <puetzk_at_puetzk.org>
Date: 2005-04-03 20:09:37 CEST

Erik Huelsmann wrote:

>
> The following are my results building and testing the Subversion zip for
> 1.1.4 on Win32.
>
> With BDB:
>
> - utf8_tests.py FAILs
> This error is not new; it has been failing for a long time. It's
> currently
> disabled (SKIP) on trunk/ .
>
>
> With FSFS:
>
> - same utf8_tests.py FAILure
> - repos_tests.exe FAILs: it's not the tests themselves, but the cleanup
> afterwards
> which removes the repository directories. Nearly the only thing which
> happens in
> that cleanup is a call to svn_io_remove_dir(). So, I inserted a sleep()
> call
> before that: Now the tests (and their cleanup) succeed.
> I just checked: 1.1.3 had the same failure.
>
>
> This is not a problem of the tests themselves, I would like to know if we
> should add a retry loop the same as with deleting files. That could affect
> FSFS repositories on Windows a lot, since directories are used as
> transactions...

FWIW, I have seen this problems in real use of 1.1.3, with the txn dirs. For
me at least, the cause was the Veritas DLOClientU backup agent. When the
files inside the directory get removed, it opens the directory handle.
Presuably the 'dir changed' event didn't tell it whether things were added
or removed, so it's doing a a readdir to see what's in there. But this
means that it's got the directory open when svn tries to remove it.

I have some traces from the sysinternals filemon that show this collision
if anyone is interested. I didn't post anything here, since I considered it
a DLOClient bug and hassled IS for about a week with filemon traces of each
collision until they agreed to let affected people use scheduled-backup
mode instead of realtime-backup. Not all were SVN - I hit the same problem
explorer.exe (put a bunch of stuff in the recycle bin, then hit empty -
explorer will abort with "file in use" errors trying to remove
directories). So apparently Veritas doesn't didn't even do even the most
rudimentary compatibility testing on the realtime mode :-P

> So, opinions? Is this bad, or do we decide it's not a regression and
> therefor not a 1.1.4 problem?

<politics>
The directory case is particularly nasty in that even paths which are not
marked for backup are affected, so the interaction isn't obvious
(apparently it does the readdir, then filters the list of filenames). IS
certainly tried to pin it on our use of a buggy 'freeware' program. If I
hadn't had enough knowledge to get to the bottom of it and *prove* that it
was Veritas at fault with filemon traces involving multiple programs, they
might have succeeded.

If you can't tell, the relationship betwen Corporate IS and actual users has
gotten rather ugly and adversarial. Oh well...
</politics>

<technical>
I'd certainly welcome it 'just working', but the problem is the combination
of exclusive opens, and all the stupid follow-along software on win32, so
it probably needs to be worked around all the way at the bottom - every
file or directory unlink call is subject to this sort of failure mode. So
the place to work around it is probably apr. Note that the delays involved
might be *very* long - DLOClient will actually try to transmit the file
contents, over the network, to the backup server, before it closes the
file. I wonder if there's some way to determine what program has the file
open when we go to retry, so that the error message could implicate the
offender if we dinally do time-out.
</technical>

> bye,
>
>
> Erik.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Apr 3 20:13:38 2005

This is an archived mail posted to the Subversion Dev mailing list.