[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: bug in svn diff and related?

From: mike south <msouth_at_mcclatchyinteractive.com>
Date: 2005-03-16 07:46:12 CET

Ben Collins-Sussman wrote:
>
> On Mar 15, 2005, at 10:15 PM, Travis wrote:
> >
> > True: I am considering the cost of the extra integer comparison and
> > branch to be negligible. If you are decreasing the accuracy of the
> > heuristic because of that cost, it seems like an unworthwhile
> > micro-optimization to me.
>
> The speedup is noticeable, and affects everyone all the time. Try
> switching the algorithm and doing some timing tests to see for
> yourself.
>
> The situation you're worried about -- a process that tweaks a
> working-file's timestamp into the past -- is incredibly rare. We
> simply never hear about it.
>
> So in theory, yes, we're not as accurate as we could be. In practice,
> this just doesn't matter. The performance tradeoff is well worth it.
> This is based on 5 years of Subversion usage and (?) 15 years of
> widespread CVS usage. :-)

In the five years of subversion usage and the 15 years of CVS usage, how
many of the commits were being done by machine by dopey users like me
who figure that the subversion perl API "just works"? We're in a whole
different era now that there is a perl api, because now machine checkins
and modifications are feasible for the much-less-intelligent. That has
quite possibly never been true in the past.

You don't have to fiddle with mtimes to cause buggy behavior--all you
have to do is have a machine making the modifications, and it's going to
get the commit and then the subsequent modification done inside of the
same second a high percentage of the time. The test code I posted gave
me 2 errors out of four tries.

In my case, we are having a user put data into the system, and (they
think) that first data they put in is committed to the repository,
always recoverable. Except that it doesn't get committed. Then say
they make a change, save it, and it breaks something. They then have
the option of reverting to a previous revision. So they do that, but
the first revision in the list is the empty thing that was initially
committed, and the next one is their broken one. Data loss.

Now, if there were a force flag* we could pass to commit, we could avoid
data loss by passing that when we know we want to bypass the mtime
dance. That would allow people to have some control over that algorithm
from outside of the source code.

I'm about to go implement a fix for our situation which--you guessed
it--modifies the mtime of the file by one second to ensure that our
users don't lose data. So those rare programs that fiddle with
timestamps are now a bit less rare.

I understand that a large majority of svn usage is going to be by humans
editing source files by hand and very rarely getting anything done in
sub-second intervals. The problem is, it's a great tool, and one of the
hallmarks of a great tool is all the things people start to do with it
that the original designers never intended.

mike

*I searched the dev list and I see that there is a lot of opposition to
a --force flag on commit. The arguments there were all about the fact
that it would sacrifice accuracy. However the above argument says that
the algorithm should stay the way it is in the interest of speed, at the
expense of accuracy. One could argue that if you want the speed that
dependence on mtime gives you, then you should give us --force so that
we can manually take care of the accuracy that has been sacrificed in
the name of speed.

One could also argue that I just need to run a (non-svn) diff against
the .svn base myself because single file operations are the exception.
But my point is that it would be better for the tool to let me do
exceptional things with it.

A happy medium solution might be to have a flag --slow-check that goes
straight to char-by-char comparison or whatever.

You could also make it compile-time configurable whether commit
supported --force so people that aren't using it in exceptional ways
don't have to worry about their silly users using --force when what they
really need to do is fix their clocks. In our case we use subversion in
both the traditional, human-edited files mode and machine mode, but we
do it in different places. We could (fairly) easily use different binaries.

And I could have hacked the source myself and set it up how I wanted it
in half the time it took me to write this, I know :).

Anyway, now that I've ranted at length, let me again thank all you folks
for the wonderful work on subversion, we really love it. The control
over directories and symlinks alone saves us a tremendous amount of
hassle. Keep up the good work!

-- 
-----------------------------------------
       Name : Mike South
      Title : Software Developer
    Company : McClatchy Interactive
Work Phone : (919) 861-1259
   Work Fax : (919) 861-1300
Work Email : msouth@mcclatchyinteractive.com
        AIM : msouthmi
   Web Site : http://www.mcclatchyinteractive.com/
-----------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Mar 16 07:49:16 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.