[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Evaluating SVN as a Document Management Solution

From: gabriel <gascencao_at_gmail.com>
Date: Tue, 11 Mar 2008 17:15:23 -0700 (PDT)

I really liked your answer.

I have a similar problem as rj. But my concern is repository disk

I have 20 Gb in Word and Excel documents. They are being updated
pretty often. I'm using a shared folder now and I'm considering SVN.
I'm worried that the repository gets huge too soon. I would like to
have some revisions of each file but I don't need to have all
revisions. Let's say I just need 5 revisions for each file (that means
100 Gb in total and it is fine by me).

I don't like the DUMP - RELOAD strategy because of the following
- A dump and reload of 100 GB would take too much time for me.
- A dump and reload will keep just the head revision. I would like to
preserve 5 revisions.
- A dump and reload will need someone's time to check that everything
works right.

Is there any way or workaround to set the max number of revisions for
each file??

Thanks in advance for anyhelp you can give me.

On 8 mar, 12:01, "B. Smith-Mannschott" <ben..._at_gmail.com> wrote:
> On Mar 8, 2008, at 00:50, Tom Blough wrote:
> >>> typically in MS Office applications, deliverables are typically
> >>> drawings in AutoCAD or Microstation, and database content
> >> is typically
> >>> financial data from which reports are generated.
> > For your application, your repository will be huge. All of the file
> > types
> > you mention are binary. Therefore, SVN cannot calculate a diff on
> > the file
> > and will end up storing a copy of the complete file.
> This is incorrect. You're probably thinking of CVS or some similarly
> brain-damaged revision control system. SVN uses compressed binary
> differences between versions for storage in its repository.
> This works well for text of course. It also works well for binary
> formats which don't themselves use compression, such as Microsoft
> Word's DOC, uncompressed TIFF, ...
> > There was a recent thread concerning using XML data formats for newer
> > versions of Office in order to save diff content, but that can cause
> > problems due to the fact that XML is not order specific. Office
> > can, and
> > does, generate different XML for the same document.
> Well, yes, that will tend to make your differences larger than they
> have to be. The real problem however is that most of these "XML"
> formats are not, in fact, XML but rather XML compressed within a ZIP
> archive.
> Where Subversions binary differencing and compression fails is on file
> formats that are themselves compressed (OpenDocument, OfficeOpenXML,
> PNG, GIF, JPG, ...). Because of the compression, even a small change
> in the document may cause it's representation on disk to change
> completely. The difference algorithm can't "see through" this.
> Furthermore, subversion's built-in compression (like any compression
> algorithm) won't be able to further compress something that's already
> compressed.
> I've done an experiment to verify this. I set up three repositories
> each containing a single document in one of three formats. In this
> case, I used the text of _The Count of Monte Cristo_ from Project
> Gutenberg as ASCII Text (2568 KB), as Microsoft Word DOC (6384 KB) and
> as OpenOffice ODT (1060 KB). I created 8 variants of each of these
> documents (inserting or removing a paragraph here or there) to
> represent minor edits. I then made 80 commits to each of the three
> repositories drawing upon the aforementioned 8 variants in round-robin
> fashion to simulate a history of 80 minor edits made and committed.
> While doing this I kept track of the total size of the repository.
> * All three repositories grow linearly in size, but the ODT repository
> grows more quickly (steeper slope).
> * The ODT repository is smallest for the first few commits but quickly
> out grows the TXT and DOC repositories.
> * The DOC repository is larger than the TXT repository and grows
> slightly faster in comparison.
> * The size difference between the TXT and DOC repositories is not as
> large as the relative size of the formats (2568 KB vs 6384 KB) might
> suggest. DOC may be twice as large as TXT but much of this difference
> is redundancy which SVN is quite capable of compressing away.
> * Final repository sizes after 80 commits: TXT = 10052 KB; DOC = 16288
> KB; ODT = 58260 KB.
> See also attached PNG.
> // Ben Smith-Mannschott
> growth or repo through repeated minor edits75.png
> 9 KVerDescargar
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr..._at_tortoisesvn.tigris.org
> For additional commands, e-mail: users-h..._at_tortoisesvn.tigris.org

To unsubscribe, e-mail: users-unsubscribe_at_tortoisesvn.tigris.org
For additional commands, e-mail: users-help_at_tortoisesvn.tigris.org
Received on 2008-03-12 07:19:50 CET

This is an archived mail posted to the TortoiseSVN Users mailing list.