Re: Changing the "native" newline mode

From: Mark Phippard <markphip_at_gmail.com>
Date: Sun, 14 Feb 2010 09:45:08 -0500

On Sun, Feb 14, 2010 at 2:34 AM, Glenn Maynard <glenn_at_zewt.org> wrote:

> Based on looking through [1] some more, it looks like "cp -a wc1 wc2"
> and renaming working copies should work fine, since the database is
> inside the working copy, and will just get copied along with the rest.

In SVN 1.7 there will be a single .svn folder at the root of a working
copy. Beyond 1.7 there are plans to make this configurable so that
you could have it in ~/.subversion and shared across all your working
copies. Of course the default will be the same as it will be in 1.7.

> Hopefully there'll still be a way to slice out a piece of a repository
> ("mv wc1/trunk .; rm -rf wc1"), which wouldn't work if it's dependent
> on a global db at the top.

There has been talk of adding a svn detach command to do this. Not
sure if it will be done as part of 1.7. AFAIK, the plan is to add it
later.

> I have a few gigs of ~5 meg files in Subversion, and the idea of
> storing large blocks of data in SQLite is a bit scary; I don't think
> it's designed for blobs that size. Anything that lumps files together
> like this is effectively subjected to two layers of fragmentation
> instead of one (filesystem + db).

There has never been any plan or discussion to store the pristine
files in SQLite. As you point out, it is not well suited for that and
would work poorly. SQLite is being used to store the SVN metadata and
properties which are arguably just stored in a custom DB today. When
the WC data is centralized the current code that has to read all the
metadata, parse it and write it back out would be less efficient than
using a database and just being update/insert rows as needed. Plus we
got some benefits from being able to use SQL indexes.

The storage format for the pristine files will still be files but it
is being changed to be based on the SHA-1 hash for the files. I'd
imagine the structure will be sharded based on the first two
characters of the hash. This will bring several benefits:

1) On case insensitive file systems like Windows and OSX it will allow
files to be renamed only by case. Today that fails because of the way
the pristine copy is stored. Once it is a SHA-1, it will not matter.

2) Space savings. When you have files in a working copy with the same
hash, there will only be a single pristine copy stored. This will
likely be a minor benefit in 1.7, but imagine when you can have all
your working copies centralized in a single location. If you have
multiple copies of trunk checked out, or even multiple branches, it is
likely there would be a lot of sharing of pristine copies and would
save a significant amount of disk space.

3) Performance. This will be a future benefit. But again, imagine
you have a single centralized working copy area. When you do a
checkout we can enhance the client/server protocol so that the when
the server returns the list of items for the client to fetch it also
includes the SHA-1. Now the client can be made smart enough to only
fetch the items it does not already have. So imagine you have trunk
checked out and you want to checkout a branch. Maybe 90% of the files
would already be on your disk and the client could just fetch the
other 10% and construct the working copy from what it already has
available.

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/

Received on 2010-02-14 15:45:43 CET

This message: [ Message body ]
Next message: Stefan Sperling: "Re: Relative or absolute paths in patch_target_t?"
Previous message: Glenn Maynard: "Re: Changing the "native" newline mode"
In reply to: Glenn Maynard: "Re: Changing the "native" newline mode"
Next in thread: Glenn Maynard: "Re: Changing the "native" newline mode"
Reply: Glenn Maynard: "Re: Changing the "native" newline mode"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]