[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn completely looses file modifications due to FAT 2s time resolution (data loss!!?)

From: Ben Reser <ben_at_reser.org>
Date: 2004-07-12 23:14:16 CEST

On Mon, Jul 12, 2004 at 03:24:47AM +0200, C.A.T.Magic wrote:
> I already pointed out that the 'sleep_for_timestamps'
> is a really weak solution and will break for FAT disks,
> because FAT only records file modifications with
> a resolution of 2 seconds (and some very strange
> time-rounding rules... afair).
>
> Please look at the small script below,
> and its output. only the FIRST revision gets committed,
> even though there should be 5 commited revisions.
> this means that 4 file modifications are LOST.
>
> it works if i put a 'sleep' after each svn commit command.

[snip]

> The script was tested on a 256MB FAT(16) partition,
> WindowsXP svn version 1.0.4 (r9844) using file:///
> - svn 1.1/latest also has the same bug.
>
>
> please fix this, at least by increasing
> that sleep to some "2.5second" wait...
>
>
> a smart solution could be:
> write a ".svn/dummy" file to disk,
> read back its modification time and wait
> for that time to pass by.

After staying up last night fairly late messing with this I have the
following thoughts about it. But first some background for people that
don't know the details of what's going on here.

Whenever the working copy needs to know if a file has local
modifications to it we go through the following processs:
a) Has the file size changed?
b) Has the timestamp changed

If a) is true then we know the file has changed. But if the file size
is the same and the timestamp is the same we have to compare the files,
which is expensive.

But if both the file size and timestamp on the file have not changed we
presume that the file has not changed.

Now every filesystem has limits on the resolution to which it can keep
track of timestamps. These range from the 2 second resolution of FAT to
the microsecond resolution of newer filesystems. So it is possible that
a user can do an update or commit that updates the timestamp in the
entries file and then changes the file again within the resolution of
the filesystems timestamp ability.

To prevent this we currently sleep for 1.1 seconds at the end of such
operations. We say that a user shouldn't be modifying their working
copy while these operations are running and should wait. Plus we set a
lock that prevents another svn operation from happening at the same
time.

Basically what we need is for a way for the wc to know if a file has
changed locally, without comparing every file as this would be
expensive. There are a couple ways of doing this:

a) What we're doing now (though we're not doing it very well). Sleep
for the time of the lowest resolution filesystem we support plus another
tenth of a second. Currently that's FAT so 2.1 seconds. This isn't
particularly desireable because it penalizes people with modern
filesystems for the restrictions of really old filesystems. We really
don't want to dimish our performance this way. In this case we're
relying on the filesytem to accurately represent timestamps with 2
seconds.

b) Drop the sleep. Make the code that checks if a file is changed not
accept that a file is unchanged without comparing the files when the
timestamp of the file is within a certain range of the current time.
I.E. if the file has been changed within 5 seconds of the current clock,
force a comparison of the file. This requires the filesystem to
accurately represent timestamps within 5 seconds. Avoids the
performance issue, we're no longer sleeping. However, we're now relying
on the system clock to maintain accurate time. If the clock is moved
forward, then we may miss a change. This is possible in a world where
machines run ntp clients to keep their clocks set accurately, though a
corretcly configured machine isn't going to jump the clock enough that
it should matter if we do the above. So we're probably not inclined to
rely on this.

c) Keep track of the oldest timestamp we write into the entries file.
Touch a temporary file until the oldest entry timestamp < the temporary
files timestamp. This will work even if the filesystem rounds time
forward or backward to make the time fit its resolution, doesn't matter
which way it does it. We can avoid bothering to make a new file by
simply using one of the files in the admin area of the working who's
timestamp doesn't matter. format, README.txt, and empty-file are all
canidates. The problem with this plan is I don't think this will be
particularly easy to achieve with our code base currently.

Right now the sleeping occurs in the client library. So that it can
happen at the end of a command and therefor we don't end up sleeping
more than once. The problem is the client lib doesn't have access to
the information it needs in order to do the above.

So what should we do?

I think we should punt this issue for now until 2.0 when we redesign the
working copy. Here's why:

a) The file change has to occur within the resolution of the fliesystem
but also has to result in a file of exactly the same size. This is,
IMHO, a rare occurance in practice.

b) The problem only exists on filesytems with a resolution of less than
1 second. So far we know of exactly one filesystem this is
reproduceable on, FAT. An aging, deprecated file system that everyone
(bar flash memory vendors) is moving away from.

c) This problem wasn't stumbled upon. The reporter went looking at the
code trying to find a way to speed things up and remove the sleep. In
the process of this discovered this edge case and proceeded to produce
an example. However, we've yet to see someone running into this example
through ordinary use.

d) There are adequite work arounds for people that really want to script
subversion to do things like above on filesystems with less than a 1
second resolution. Sleep for 1 more second after a call or modify the
source code of their copy to increase the timestamp sleep time.

If we do anything we maybe should make a file in our distribution,
KNOWN_ISSUES, that explains this limitation and what you can do about it
in the meantime. But I don't see a way to fix it within our
compatability gurantees until 2.0. Maybe someone else will see a way,
but I don't.

-- 
Ben Reser <ben@reser.org>
http://ben.reser.org
"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Jul 12 23:14:35 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.