Re: Question regarding Microsoft patent on file synchronization

From: Karl Fogel <kfogel_at_red-bean.com>
Date: 2007-04-21 21:10:23 CEST

Sean McCarthy <smccarthy@integraas.com> writes:
> I'm not sure if it is the right list to post, but reviewing some
> patent documents our company found this patent document from Microsoft
> relating to file synchronization:
>
> http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=75&f=G&l=50&d=PTXT&s1=microsoft.ASNM.&p=2&OS=AN/microsoft&RS=AN/microsoft
>
> While regarding the binary file synchronization (possibly for file
> systems) this patent shares a lot of points in common with the way
> subversion makes files synchronized.
>
> The patent was filed on November 2003 and granted as a patent on April
> 10, 2007. It is clear that the claims were made after Subversion
> conception and that it shares the same concepts with other open source
> projects as 'rsync' that dates back to the filing (even 1999).
>
> We are just wondering if this affects in any way to Subversion and a
> possible patent infringement retaliation from Microsoft.

Thank you for posting about this, Sean. I've CC'd dev@.

I am fairly sure this patent would not withstand a challenge from an
even mildly competent garden slug. It describes techniques that have
been known and practiced in the field of version control for decades.
I am not too worried about Subversion being subject to an infringement
suit based on this patent.

For entertainment value, here's a translation of Claim #1 (of 20) into
plain English, or at least into English that's usual for this field:

   A method of maintaining an updated file, comprising:

   - storing copies A and B of a base file at client1 and at client2

   - receiving change C1 to client1:A
           and change C2 to client2:A

   - determining DIFF1 == client1:A<->client1:B
             and DIFF2 == client2:A<->client2:B

   - transmitting DIFF1 and DIFF2 to a server

   - receiving either DIFF1 or DIFF2 at the server first in time

   - iff the base file on the server is the same as the base file
     stored at the client associated with the diff received first,
     server accepts that diff; otherwise server rejects the diff

   - server rejects the diff received second in time [here, they did
     not spell out the fact that the server should reject the diff
     received second whether or not it rejected the one it received
     first, because either way the server's file is now different:
     either it applied the change from the diff it received first, or
     it did not apply that change because the client's base copy was
     out of date anyway]

   - transmitting a third diff from server to the client that sent the
     diff that was received second [this "third diff" is simply the
     diff needed to bring the other client up to date with the change
     that the server accepted]

   - applying the third diff to the second copy of the base file
     stored at the client [also known as "merging upstream changes
     into a locally modified file"]

As you can see, Claim 1 simply describes the standard commit/update
algorithm in Subversion. Note that Subversion didn't invent this; we
took it -- with only trivial optimization changes -- from CVS, which
has been using it since 1986 IIRC, and CVS didn't invent it either.
I'm not sure the patent is claiming that part is original, though.

The remaining claims go on to describe how to handle the out-of-date
case, by keeping multiple base copies (text-bases, we might call them)
at the clients, using them to reconstruct the latest server file,
reconstructing the diff that expresses the local change using the new
data, and retransmitting that to the server. Very roughly:

1. client-working-copy-1 starts out same as client-text-base-1

2. client-working-copy-1 gets a local change, now differs from
client-text-base-1

3. client sends diff to server, but server notices that
client-working-copy-1 is out-of-date.

4. server transmits a diff to bring client up-to-date

5. client creats client-text-base-2, then applies that diff to it
to create updated-client-text-base-2

   6. client can now retransmit its local change, by creating
      client-working-copy-2 as a copy of updated-client-text-base-2,
      taking the diff from client-text-base-1 to client-working-copy-1
      and applying it t client-working-copy-2, and then transmitting
      (to the server) the diff from update-client-text-base-2 to
      client-working-copy-2.

Yawn. The method described is not even as clever as what rsync does.

The distinctiveness claimed for "binary diffs" is spurious. All diffs
are binary diffs; textual diffs are just binary diffs that treat
certain character sequences (LF, CRLF, etc) specially, using them as
anchor points for finding range boundaries. But you can find range
boundaries without any anchors at all (rsync does it, so do we), and
yes, you can do fuzzy application of diffs without anchors too.

In my professional opinion, this patent should not have been granted.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Apr 21 21:10:32 2007

This message: [ Message body ]
Next message: Hyrum K. Wright: "Re: [PATCH] JavaHL list API update"
Previous message: Karl Fogel: "Re: svn commit: r24694 - trunk/subversion/libsvn_fs_fs"
Next in thread: Mark Reibert: "Re: Question regarding Microsoft patent on file synchronization"
Reply: Mark Reibert: "Re: Question regarding Microsoft patent on file synchronization"
Reply: Tom Malia: "RE: Question regarding Microsoft patent on file synchronization"
Maybe reply: Karl Fogel: "Re: Question regarding Microsoft patent on file synchronization"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]