Re: What are the potential performance & limit considerations for high number of files in repository?

From: Andy Levy <andy.levy_at_gmail.com>
Date: Thu, 30 Apr 2009 09:39:52 -0400

On Wed, Apr 29, 2009 at 19:33, <webpost_at_tigris.org> wrote:
> Hi,
>
> I would appreciate any advice regarding the following problem.
>
>
> Context:
> --------
>
> I'm considering using subversion as a data repository.
>
> The usage patterns on the data are very similar to source code development. The data is textual in nature. Currently, there are about one million very small files (typically < 1KB each).And it is growing at a relatively slow rate (say maybe about million per year).
>
> The usage flow will include branching and merging of sub-directories as well as the entire repository. There are about total of 20 users and they are all within the same LAN.
>
> The nature of the data is such that each user will typically sync up or check out a folder with 500 files, work on it for 1-3 days and check it back in.
>
> It is all on a Windows based environment (Win2K, Apache, svn 1.4.x)
>
>
> Questions:
> ----------
>
> 1) What are the limitations on number of files in the repository (assuming I have sufficient hard-disk space of course & within NTFS limits)?

Your files aren't stored as individual files in the repository; each
revision is stored as its own file. So if you commit 1000 files in a
revision, it will be as many files in the repository as if you had
committed a single file.

I would recommend upgrading to a newer version of Subversion - at
least 1.5 - to get repository sharding. That way you won't have
thousands upon thousands of revision files in a single directory (NTFS
doesn't handle that case well). With 1.5's sharded directories, the
repository is split up into directories of 1000 revisions each to keep
the files per directory count down.

If you're doing to be doing merging, 1.5's merge tracking will also be
a huge benefit.

With 1.6, you can "pack" each completed sharded directory into a
single file to gain some performance, and reduce your file count
further.

Moving up from Win2K also wouldn't be a bad idea.

> 2) Are there any known potential performance bottlenecks/issues in such data repository organization (i.e. where are the potential slowdowns or performance concerns)?

Will you have users checking out/working on large sections of the
repository at once? Will you be doing a lot of large merges?

> 3) My understanding from previous threads is that in terms of total size I'm well within the limits of the system (1-2 GB of data) so this not of a concern. Please correct me if I'm wrong.

Only 1-2GB of data is a drop in the bucket.

> 4) Generally, is this a valid usage for subversion (in terms of number of files & size, assume development like usage pattern) and has anyone had experience with such repositories? In other words - is it a totally trivial & simple repository layout for subversion that's done everywhere...?

There are some very, very large open source projects which have been
quite successful with Subversion. Apache & KDE to name two. If SVN can
handle those, you should be fine.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1995236

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-04-30 15:40:53 CEST

This message: [ Message body ]
Next message: Mark Phippard: "Re: What are the potential performance & limit considerations for high number of files in repository?"
Previous message: Bob Archer: "RE: svn????????"
In reply to: webpost_at_tigris.org: "What are the potential performance & limit considerations for high number of files in repository?"
Next in thread: Mark Phippard: "Re: What are the potential performance & limit considerations for high number of files in repository?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]