I think the cause of this problem is understood.
Windows maintains a "Change Log" facility for the filesystem to track
what files have been changed and what sort of change is made. This
facility seems to be mimicked even when the underlying filesystem is
not a log-based filesystem (NTFS).
Various daemons utilize this feature to track changes to files. The
best known example is virus-scanning software which can efficiently
process files as they are written and not scan files that haven't
changed. Another example is Microsoft's "indexing" service that
updates its database to allow fast searching. The third commonly-used
application of this sort that I've found so far is XP's "System
Restore" feature. Remote file replication programs probably use this
facility too.
When these "tag-along" applications see that a file has been written
they will wait for it to close and then do whatever they do. This
almost certainly means opening a file descriptor or handle. Since the
Windows Win32 API doesn't allow a file to be renamed or deleted while
it is open this means that the tag-along will prevent a file from
being renamed for a brief period after the tag-along is notified that
SVN has closed it.
The window of vulnerability seems small but non-zero. Unfortunately the
standard practice of writing to a temporary file and then renaming
that file to the final name exacerbates the problem: an open-temp/write/
close/rename-final sequence is the worst case. When the close is
performed the tag-alongs will start doing their thing, perhaps before
the rename, and the rename will fail until the tag-alongs have closed
the file.
What to do about it is a harder question. The specific failure cases
seen in SVN can probably be solved by looping on rename with a very
large bound (several seconds) after renaming the target to something
else (which would probably be sufficient to prove that the desired
rename of temp-to-target will eventually succeed). Unfortunately the
documentation on Win32 calls is poor and it is not clear if there are
other ways to get the ACCESS_IS_DENIED error. Moreover it would not
be surprising if different versions of Windows have completely
different rules for generating this error.
(A nanosleep or equivalent between loop iterations might help by
yilding the CPU to the tag-alongs so that they will finish and get out
of the way)
A more extreme solution would be to use Native NT calls. NT supports
renaming an open file and the SVN sequence could be open-temp/write/
rename-final/close, using, for example, an apr_file_close_and_rename ()
function to hide the gory details. This isn't appealing because it
does nothing for the Win95 family, uses OS calls that are essentially
undocumented, and doesn't address the general problem of reliably
renaming files. Also, it might not be possible to mix Win32 API calls
and Native NT API calls on the same file handle.
I haven't seen the efforts to duplicate the problem outside of SVN.
I suspect those efforts weren't triggering the tag-alongs: it is
important to open/write/close a file before the rename since it is the
close to a file with a "file-was-written" marker that triggers the
tag-alongs. Tag-alongs may have delays between close notification and
starting their activities to try to avoid this scenario and that delay
might be subtly enhancing sightings in SVN as compared to other programs.
I plan to try it a stand-alone repro myself when I get a chance.
I don't think telling people to turn off tag-along apps is a viable
answer. Corporate policy forbids disabling anti-virus software at
many companies - it is sometimes a firing offense. Also, we don't
have a list of all such software: for example there is still a
tag-along running on my system I haven't figured out yet (it's some
feature SVCHOST is implementing).
I agree with brane that the explanation isn't proven yet and also that
there may be other causes exposed once this is dealt with.
> Date: Fri, 03 Oct 2003 06:02:06 +0200
> From: =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <brane@xbc.nu>
>
> D.J. Heap wrote:
>
> > Thanks to James' time spent tracing, I think we're closer to finding
> > out what is going on here. It appears that the COM+ Event
> > Notification service (and perhaps the System Event Notifcation
> > service) are following applications around and opening/querying files
> > they haved touched. For confirmation, could you turn off those
> > services and see if the problem goes away in your test script, James?
> >
> > Even if these services are the cuplrits, however, I'm not sure it's
> > reasonable to expect either MS to change them (although I'm don't know
> > what they are screwing with private files for anyway), or for people
> > to shut them off since apparently they are useful application services
> > and people may be using them and also want to use Subversion. Can
> > someone comment on their usefulness and in what
> > environments/applications they are used? They ship installed and
> > turned on by default in all versions of Windows 2000 and higher (2003
> > included).
> >
> > So, if they are confirmed to be the cause, what should the resolution
> > be? Bounded loop patch, tell users to shut them off, other ideas...?
>
> I say we still have to find a way to reproduce the problem without SVN.
> If we can't, then there's still a good chance that we're doing something
> wrong (either in SVN or in APR). We do know that the loop patch doesn't
> really fix the problem, it only makes it less likely to happen -- and
> harder to track down.
>
> --
> Brane Čibej <brane_at_xbc.nu> http://www.xbc.nu/brane/
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Oct 3 15:57:21 2003