On 08/16/2016 09:17 AM, Stefan Hett wrote:
> Just to have this mentioned: Be aware that the working copy (aka: the
> checked out data of the repository) will have a 2x storage requirement
> on the data since it will keep a copy of the pristine version of the
> file in addition to the "actual" file.
The type of system that I am imagining might typically have several
terabytes of instrumentation data in a repository. Various client
machines might need to check-out a few gigabytes or a few hundred
gigabytes at a time to run data analysis (automated compute jobs) or to
perform a study (scientist/human-interest).
: Version control isn't a requirement in this
use-case/hypothetical-system. Sophisticated access control is much more
of a concern. Mandatory audit trails and distributed contract based data
handling are examples of more relevant architectural characteristics.
I am currently looking at the possibility of using Subversion (in a
non-traditional, off-label fashion) to bootstrap a [very] simplified
demonstration-of-concept type of setup.
My current data-set is only about 25GB and growing at a rate of about
1GB/week. A desktop server and laptop client shouldn't have any storage
space problems (in this case as a small demonstration system).
> If this is a concern for your use-case, you could export the files and
> only use a working copy in cases where you need to commit or reorder files.
By "export the files" do you mean something like an NFS share of the
repository, thus bypassing svnserve and the check-in/check-out process?
That seems like a clever possibility worth remembering, but for now the
system I am currently building/imagining is headed in a different direction.
> To clarify: This is purely a client side storage requirement. It does
> not apply to the storage requirements on the server side.
To reduce network load, are there any client-side caching options for
Subversion? Does the svn program account for the files already in the
working copy (on the local disk) and avoid transferring those files over
the network during a subsequent check-out [that requires those files]?
Is it possible to clone or mirror all or part of a Subversion repository?
<speculative fun> This probably isn't relevant to Subversion, but in the
system I am imagining it might be reasonable for clients to check-out
data-sets via torrent connections with other full/partial repositories.
Received on 2016-08-16 21:14:09 CEST