[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Revision Differences

From: <andy.glew_at_amd.com>
Date: 2004-05-21 21:53:15 CEST

> As an intrigued user I was wondering if anyone could explain (or point
> me in the direction of) how subversion manages revisions. Does it
> simply store diffs between each revision and then patch them
> together on
> the fly as requested or does it store the entire file each
> time? And I
> guess there could be a mysterious third option?
>
> ~ Matthew

The SubVersion honchos will probably correct me
- they definitely know the Subversion code better
than I do, and probably know SCCS and RCS better
than I do.

But I'll start an answer, because I am also interested
in this topic:

I have gathered that Subversion keeps an entire file
for the HEAD of every branch that has not been deleted.
(Gathered this because of some experiments with
space utilization.)

They apparently apply reverse diffs to get back to earlier
versions.

The Original SCCS used to keep the full file for
the base version, and then apply forward diffs.
Later, they changed to keeping the most recent
version and applying reverse diffs.

RCS, I believe, uses the "full version at head"
approach.

Later still, SCCS started using "interleaved diffs",
which don't have the full text anywhere - instead,
all versions can be extracted with approximately
equal speed. I think of interleaved diffs as being
approximately a single, highly IFDEFfed, file.

Comparing the approaches
   * 1 or more full versions (at head, whatever)
   * interleaved diffs

When several layers of patch must be applied
to get an old version, the time to obtain an old
version is roughly proportional to the number of
patches that must be applied.
   Whereas, with interleaved diffs, the time for
any version is approximately equal.

However, with a very large history, interleaved diffs
will be approximately equally slow for all versions.
   It makes sense that storing a full revision at or
near the HEAD is the most frequent operation.

If you have lots of active branches and/or tagged versions
storing a full version at the HEAD of each can waste
a lot of space, especially given that Subversion
uses the same notion of physical placement in the tree
for what would be a CVS tag, or a label in other VC systems.

Obviously only 1 full version need be stored for any
file that is connected by a web of patch/diffs to other
files; e.g. for any file that was ever svn cp'ed or renamed
onto branches.

Whether other full versions need be preserved is mainly
an optimization problem, trading off speed of access versus
storage.
   Similary, one can imagine systems that form interleaved diffs
not for the entire version history of a file, but just for a region
of versions that are close in time and likely to be accessed
together.

Similarly^2, the diffs need not be between adjacent versions:
r1->r2->r3,
        Instead, diffs r1->r3 that skip versions can
be stored.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri May 21 21:54:39 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.