[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PROPOSAL] Addition of rsync algorithm for saving bandwidth in 'svn takeover'

From: Jonathan Gilbert <o2w9gs702_at_sneakemail.com>
Date: 2005-07-13 00:03:19 CEST

At 04:57 PM 12/07/2005 -0400, John Peacock jpeacock-at-rowman.com
|subversion dev list| wrote:
>Jonathan Gilbert wrote:
>> This kind of development is possible with regular SVN as well; this is
>> essentially the main reason it *has* the '.svn' subdirecotry in the first
>> place :-)
>The difference is that with SVK, I can commit multiple changes to my
>mirrored copy and only later connect to the `Net and transmit the
>changes (individually or as a single commit) to the remote repository.
>Subversion offers no such capability.

This is true :-) It is noticeably more overhead than maintaining just a
'.svn' subdirectory, though... I guess you get what you pay for :-)

>> The main thrust of the 'svn takeover' command at this point is to eliminate
>> the redundant data transfer for people who already have complete or
>> mostly-complete working copies. This is entirely different from actually
>> mirroring an entire repository and using a local server!
>Indeed; I was only discussing the ability to "import and make a working
>copy in one pass" that SVK has (which is one part of what 'svn takeover'
>is intended to cover).

What I had really intended 'svn takeover' for was *only* situations where
the repository already has all the files. It was never meant to dovetail
with 'svn import'. In order to do so effectively (e.g. not adding files
unversioned with CVS or VSS to the SVN repository), SVN would need to
acquire the ability to talk to other VCS systems' servers, which is way out
of the scope of this feature :-)

>> You have misunderstood 'svn takeover'. Its purpose is not to place files
>> into the repository, nor is it to copy a revision out of the repository.
>> The purpose is to avoid such unnecessary work in the situation where the
>> files are already in the repository, and the user already has a copy of
>> them which is not an SVN working copy (because it does not have valid
>> '.svn' subdirectories).
>I am responding to the original description which was that the user has
>a working copy checked out from some other version control system and
>wishes to convert that to a Subversion working copy. I don't see how
>there is any way to avoid all network communication while converting (at
>the very least the client and server must communicate checksums and

This is the source of the confusion, because the intent is for the (remote)
repository to already have the project, perhaps imported by another person
or through a conversion tool. The 'svn takeover' command does not ever add
files to the repository.

>To associate a selection of files with a Subversion repository (building
>the admin files on the fly) would seem to require turning the checkout
>actions inside out:
>1) client asks server for contents of current directory;
>2) client compares checksums from files it already has with checksums
>that the server provides and builds matching .svn/entries records and
>text-base (from the local file, not from the server);
>3) for all files where checksums differ, client retrieves fulltext from
>the server and creates a text-base, then leaves local file as is (i.e.
>with changes);
>4) for all files which exist on the server, but not locally, retrieve
>the text-base and mark the .svn/entries record as locally deleted;
>5) for all files which exist locally but not on the server, the client
>code does nothing.

This is the basic idea of what I am proposing in terms of bandwidth-saving.

>The only time that any file needs to be transmitted in full is the
>missing or locally changed case. But this is a significant alteration
>of current behavior (even if the pieces are all there already). I don't
>see how adding rsync helps at all).

To construct a scenario where rsync would help, consider an advertising
firm which works with ginormous image files (let's say they use TIFF
because it's popular and is lossless in its typical usage). A typical large
poster will be, say, 15,000 x 10,000 pixels, and the corresponding TIFF
file will be on the order of 500 megabytes. An employee goes on a business
trip, taking his laptop with the 40 gigabyte CVS repository checked out
(synced with head). While he is off-site (and thus disconnected from the
company LAN), his boss does two things:

1. He tells the employee "Please change the small print at the bottom of
all 25 posters you're working on to reflect our new company name."

2. He uses cvs2svn to switch to SVN, because he's getting really annoyed at
some of the "features" of CVS.

The employee gets back from the business trip and has a CVS working copy
with minor changes to some 12.5 gigabytes of TIFF files. His boss informs
him that he must now switch over to SVN. Should the employee run 'svn
checkout' to make a clean copy, and then copy his files over? With his
laptop's puny 4400 RPM hard drive and 10baseT NIC, this will take on the
order of 20 hours to complete! Suppose he uses 'svn takeover' with per-file
bandwidth optimization. He still must receive 12.5 gigabytes of data,
creating a 4 hour window where he can't effectively work on things.

With an rsync algorithm, 'svn takeover' can see that the modified files
differ only slightly, reducing the total data transfer to a dozen or two
megabytes, which completes in under a minute.

Is this situation too far-fetched?

Jonathan Gilbert

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jul 13 00:02:15 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.