[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Evaluating SVN as a Document Management Solution

From: Lasse Vågsæther Karlsen <lasse_at_vkarlsen.no>
Date: Wed, 12 Mar 2008 09:22:25 +0100

Perhaps you should just look at the shadow folder system in windows 2003
server, perhaps it would suit your needs better.

On Wed, Mar 12, 2008 at 1:15 AM, gabriel <gascencao_at_gmail.com> wrote:

> I really liked your answer.
>
> I have a similar problem as rj. But my concern is repository disk
> space.
>
> I have 20 Gb in Word and Excel documents. They are being updated
> pretty often. I'm using a shared folder now and I'm considering SVN.
> I'm worried that the repository gets huge too soon. I would like to
> have some revisions of each file but I don't need to have all
> revisions. Let's say I just need 5 revisions for each file (that means
> 100 Gb in total and it is fine by me).
>
> I don't like the DUMP - RELOAD strategy because of the following
> reasons:
> - A dump and reload of 100 GB would take too much time for me.
> - A dump and reload will keep just the head revision. I would like to
> preserve 5 revisions.
> - A dump and reload will need someone's time to check that everything
> works right.
>
> Is there any way or workaround to set the max number of revisions for
> each file??
>
> Thanks in advance for anyhelp you can give me.
>
> On 8 mar, 12:01, "B. Smith-Mannschott" <ben..._at_gmail.com> wrote:
> > On Mar 8, 2008, at 00:50, Tom Blough wrote:
> >
> > >>> typically in MS Office applications, deliverables are typically
> > >>> drawings in AutoCAD or Microstation, and database content
> > >> is typically
> > >>> financial data from which reports are generated.
> >
> > > For your application, your repository will be huge. All of the file
> > > types
> > > you mention are binary. Therefore, SVN cannot calculate a diff on
> > > the file
> > > and will end up storing a copy of the complete file.
> >
> > This is incorrect. You're probably thinking of CVS or some similarly
> > brain-damaged revision control system. SVN uses compressed binary
> > differences between versions for storage in its repository.
> >
> > This works well for text of course. It also works well for binary
> > formats which don't themselves use compression, such as Microsoft
> > Word's DOC, uncompressed TIFF, ...
> >
> > > There was a recent thread concerning using XML data formats for newer
> > > versions of Office in order to save diff content, but that can cause
> > > problems due to the fact that XML is not order specific. Office
> > > can, and
> > > does, generate different XML for the same document.
> >
> > Well, yes, that will tend to make your differences larger than they
> > have to be. The real problem however is that most of these "XML"
> > formats are not, in fact, XML but rather XML compressed within a ZIP
> > archive.
> >
> > Where Subversions binary differencing and compression fails is on file
> > formats that are themselves compressed (OpenDocument, OfficeOpenXML,
> > PNG, GIF, JPG, ...). Because of the compression, even a small change
> > in the document may cause it's representation on disk to change
> > completely. The difference algorithm can't "see through" this.
> > Furthermore, subversion's built-in compression (like any compression
> > algorithm) won't be able to further compress something that's already
> > compressed.
> >
> > I've done an experiment to verify this. I set up three repositories
> > each containing a single document in one of three formats. In this
> > case, I used the text of _The Count of Monte Cristo_ from Project
> > Gutenberg as ASCII Text (2568 KB), as Microsoft Word DOC (6384 KB) and
> > as OpenOffice ODT (1060 KB). I created 8 variants of each of these
> > documents (inserting or removing a paragraph here or there) to
> > represent minor edits. I then made 80 commits to each of the three
> > repositories drawing upon the aforementioned 8 variants in round-robin
> > fashion to simulate a history of 80 minor edits made and committed.
> > While doing this I kept track of the total size of the repository.
> >
> > * All three repositories grow linearly in size, but the ODT repository
> > grows more quickly (steeper slope).
> >
> > * The ODT repository is smallest for the first few commits but quickly
> > out grows the TXT and DOC repositories.
> >
> > * The DOC repository is larger than the TXT repository and grows
> > slightly faster in comparison.
> >
> > * The size difference between the TXT and DOC repositories is not as
> > large as the relative size of the formats (2568 KB vs 6384 KB) might
> > suggest. DOC may be twice as large as TXT but much of this difference
> > is redundancy which SVN is quite capable of compressing away.
> >
> > * Final repository sizes after 80 commits: TXT = 10052 KB; DOC = 16288
> > KB; ODT = 58260 KB.
> >
> > See also attached PNG.
> >
> > // Ben Smith-Mannschott
> >
> > growth or repo through repeated minor edits75.png
> > 9 KVerDescargar
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr..._at_tortoisesvn.tigris.org
> > For additional commands, e-mail: users-h..._at_tortoisesvn.tigris.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_tortoisesvn.tigris.org
> For additional commands, e-mail: users-help_at_tortoisesvn.tigris.org
>
>

-- 
Lasse Vågsæther Karlsen
mailto:lasse_at_vkarlsen.no
http://presentationmode.blogspot.com/
PGP KeyID: 0xBCDEA2E3
Received on 2008-03-12 09:31:46 CET

This is an archived mail posted to the TortoiseSVN Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.