On Fri, Feb 8, 2013 at 10:57 AM, Nico Kadel-Garcia <nkadel_at_gmail.com> wrote:
>>
>>> In my $work, we manage thousands of binary files (tiffs). We may modify a
>>> file once or twice before eventually entering the file as a record. Files
>>> arrive in groups (a submission) and I would like to track changes and the
>>> history of a file. Once the file is entered as a record, I could remove much
>>> of the history.
>>>
>>> I've used subversion for software version control and I am wondering if I
>>> would be stretching it's features to versioning thousands of binary files
>>> (currently 13,000 since the start of 2013) at about 60MB each file.
>>>
>>> Apart from the size of the diffs/deltas, I am struggling to envisage a way
>>> to organise the repo. Making a new project for each submission would make
>>> make the whole repo unwieldy.
>>>
>>> Has anyone used subversion for this type of tracking? Does what I'm
>>> proposing sound feasible? Any thoughts would be appreciated.
>>
>> I don't believe there is a reasonable way to ever remove anything from
>> a subversion repository such that it releases the space used for the
>> thing you removed. So, I wouldn't consider this with subversion
>> unless you can work out a way to make separate repositories for one or
>> a few files so it would be feasible to just remove the whole thing if
>> you no longer need it or 'svnadmin dump/filter/load' to restructure
>> them.
>
> Separate repositories linked together by "svn;external" settings can
> do this, with a central "build" structure publishing tags or branches
> with hooks to specific releases of components from other repos. But
> resource tracking can get awkward. Some old legacy repo that only one
> project was using can wind up culled, with managerial approval, and
> discovered to be critical to another legacy tool or two that no one
> has built for a few years and kept saying "if it's not broken, don't
> fix it". So factoring the repositories well, and having good archival
> backups, can be invaluable.
You can simply put a bunch of repos under the top level served by http
or svn and it appears pretty seamless except for when you have to
create a new one. But, since binary diffs aren't very useful anyway
and that migh have scaling issues, I think I'd just try to use a
de-duping filesystem like zfs and store as many copies as might still
be useful.
--
Les Mikesell
lesmikesell_at_gmail.com
Received on 2013-02-08 18:52:08 CET