On Wed, Sep 19, 2012 at 10:53 AM, Daniel Shahaf <danielsh_at_elego.de> wrote:
> Apache subversion Wiki wrote on Tue, Sep 18, 2012 at 23:30:26 -0000:
> > Dear Wiki user,
> >
> > You have subscribed to a wiki page or wiki category on "Subversion Wiki"
> for change notification.
> >
> > The "StarDelta" page has been changed by StefanFuhrmann:
> > http://wiki.apache.org/subversion/StarDelta
> >
> > Comment:
> > WIP. first part
> >
> > New page:
> > = Star Deltas =
> >
> > == Introduction ==
> >
> > FSFS currently uses xdelta to store different version of the same node
> efficiently.
> > Basically, we represent node x_i as
> >
> > x_i = x_i-1 o \delta(x_i, x_i-1)
> > x_i-1 = x_i-2 o \delta(x_i-1, x_i-2)
> > ...
> > x_0 = x_0
> >
> > and store x_0 plus the incremental \delta information. x_i gets
> reconstructed by
> > starting with x_0 and iteratively applying all deltas. Assuming that
> size(x_i) is
> > roughly proportional to i and the deltas averaging around some constant
> value,
>
> This assumption means that every commit to a file increases its size by
> 948 bytes (or some other constant number that depends only on the
> node-id).
This is not what it says! Get some coffee ;)
My assumptions are
* the average size of changes tends to be constant over time
(e.g. commit size and changed files per commit)
* the average ratio of lines being changed ./. added ./. removed
is constant over time
The first assumption is certainly justified for code that cannot
mature beyond some reasonable level. As long as you have
an open set of requirements, the nature of patches does *not*
change over time (but it may fluctuate between development
and stabilization phases).
The second is based on the observation that a patch for a new
feature is not very different from a complicate patch for some
hard bug. The "growth" rate per change of a file is more a function
of its coding style that its age. Once a feature has been
implemented (larger changes, many insertions) and stabilized
(smaller changes, mainly modifications), either the next feature
or hard bug will start that cycle again.
I don't think that's how software development (one use-case
> of svn) works. Do you have real world data to corroborate your
> assumption? Or perhaps a use case that would trigger such behaviour?
>
Well, scroll down to the bottom of the wiki page. My hypothesis
is definitely a good fit for fs_fs.c: 644 revs to get to 379k with
an average size of 220k. Also, the main claim of O(N log N)
space and time can be witnessed in the apache repo.
My point is not to give an exact formula here as this would
require tons of pointless formalisms. The key is the make
the general O(N log N) behavior plausible to the reader - even
if long term effects would suggest O(N^.8) instead of O(N)
in my initial assumptions. And that N may be revisions just
as well as file size.
The thing is that my prototypic \Delta^\ast code has almost
ideal time and space properties: O(N) and very close to the
theoretical minimum.
-- Stefan^2.
--
*
Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*
Received on 2012-09-19 12:21:22 CEST