RE: Question regarding Microsoft patent on file synchronization

From: Tom Malia <tommalia_at_ttdsinc.com>
Date: 2007-04-22 19:58:34 CEST

This thread leads me to another question.

I was in the process of setting up a Microsoft Distributed File System with
Replication at the same time that I've been working on setting up Subversion
for Source control.

I was thinking of the two as completely different things, but now that I've
learning about setting up Subversion with Apache for WebDAV I'm starting to
question that logic.

I'm beginning to think that Subversion might be able to handle both
situations for me.

My primary reasons for wanting to setup MS DSF with replication were:

1) Location redundancy for critical files
2) centralization of files for backup purposes
3) better "performance" when access files over WAN.

So, now I'm thinking of setting up the following in Subversion:

1) setup a repository and to use as the shared "drive"
2) Configure Apache to serve that repository was a WebDav drive
3) Checkout the repository in step 1 to a WC on a server at our remote
office with a script on that server to perform an Update every few minutes
4) Grant only read level access to all users for the WC created in step 3
5) Setup a job that dumps the repository from step 1 to our backup server
just before scheduled tap backups of that server

My thinking here is that I should be able to meet most of my objects with
such a configuration.

Does anyone have any comments about whether such a configuration sounds like
good idea or not and how it might compare to implementing Microsoft DFS with
Replication?

Regards,
Tom Malia

-----Original Message-----
From: Karl Fogel [mailto:kfogel@red-bean.com]
Sent: Saturday, April 21, 2007 3:10 PM
To: Sean McCarthy
Cc: users@subversion.tigris.org; dev@subversion.tigris.org
Subject: Re: Question regarding Microsoft patent on file synchronization

Sean McCarthy <smccarthy@integraas.com> writes:
> I'm not sure if it is the right list to post, but reviewing some
> patent documents our company found this patent document from Microsoft
> relating to file synchronization:
>
>
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetah
tml%2FPTO%2Fsearch-adv.htm&r=75&f=G&l=50&d=PTXT&s1=microsoft.ASNM.&p=2&OS=AN
/microsoft&RS=AN/microsoft
>
> While regarding the binary file synchronization (possibly for file
> systems) this patent shares a lot of points in common with the way
> subversion makes files synchronized.
>
> The patent was filed on November 2003 and granted as a patent on April
> 10, 2007. It is clear that the claims were made after Subversion
> conception and that it shares the same concepts with other open source
> projects as 'rsync' that dates back to the filing (even 1999).
>
> We are just wondering if this affects in any way to Subversion and a
> possible patent infringement retaliation from Microsoft.

Thank you for posting about this, Sean. I've CC'd dev@.

I am fairly sure this patent would not withstand a challenge from an
even mildly competent garden slug. It describes techniques that have
been known and practiced in the field of version control for decades.
I am not too worried about Subversion being subject to an infringement
suit based on this patent.

For entertainment value, here's a translation of Claim #1 (of 20) into
plain English, or at least into English that's usual for this field:

   A method of maintaining an updated file, comprising:

   - storing copies A and B of a base file at client1 and at client2

   - receiving change C1 to client1:A
           and change C2 to client2:A

   - determining DIFF1 == client1:A<->client1:B
             and DIFF2 == client2:A<->client2:B

   - transmitting DIFF1 and DIFF2 to a server

   - receiving either DIFF1 or DIFF2 at the server first in time

   - iff the base file on the server is the same as the base file
     stored at the client associated with the diff received first,
     server accepts that diff; otherwise server rejects the diff

   - server rejects the diff received second in time [here, they did
     not spell out the fact that the server should reject the diff
     received second whether or not it rejected the one it received
     first, because either way the server's file is now different:
     either it applied the change from the diff it received first, or
     it did not apply that change because the client's base copy was
     out of date anyway]

   - transmitting a third diff from server to the client that sent the
     diff that was received second [this "third diff" is simply the
     diff needed to bring the other client up to date with the change
     that the server accepted]

   - applying the third diff to the second copy of the base file
     stored at the client [also known as "merging upstream changes
     into a locally modified file"]

As you can see, Claim 1 simply describes the standard commit/update
algorithm in Subversion. Note that Subversion didn't invent this; we
took it -- with only trivial optimization changes -- from CVS, which
has been using it since 1986 IIRC, and CVS didn't invent it either.
I'm not sure the patent is claiming that part is original, though.

The remaining claims go on to describe how to handle the out-of-date
case, by keeping multiple base copies (text-bases, we might call them)
at the clients, using them to reconstruct the latest server file,
reconstructing the diff that expresses the local change using the new
data, and retransmitting that to the server. Very roughly:

1. client-working-copy-1 starts out same as client-text-base-1

2. client-working-copy-1 gets a local change, now differs from
client-text-base-1

3. client sends diff to server, but server notices that
client-working-copy-1 is out-of-date.

4. server transmits a diff to bring client up-to-date

5. client creats client-text-base-2, then applies that diff to it
to create updated-client-text-base-2

   6. client can now retransmit its local change, by creating
      client-working-copy-2 as a copy of updated-client-text-base-2,
      taking the diff from client-text-base-1 to client-working-copy-1
      and applying it t client-working-copy-2, and then transmitting
      (to the server) the diff from update-client-text-base-2 to
      client-working-copy-2.

Yawn. The method described is not even as clever as what rsync does.

The distinctiveness claimed for "binary diffs" is spurious. All diffs
are binary diffs; textual diffs are just binary diffs that treat
certain character sequences (LF, CRLF, etc) specially, using them as
anchor points for finding range boundaries. But you can find range
boundaries without any anchors at all (rsync does it, so do we), and
yes, you can do fuzzy application of diffs without anchors too.

In my professional opinion, this patent should not have been granted.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Apr 23 21:52:59 2007

This message: [ Message body ]
Next message: Robert Hudson: "svn checkout --encoding ENC"
Previous message: darkwing_at_proaxis.com: "Alternate fix for the hot-backup.py read-only file issue"
In reply to: Karl Fogel: "Re: Question regarding Microsoft patent on file synchronization"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]