[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] svn_fs_text_changed

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: 2003-11-25 15:50:09 CET

Chia-liang Kao <clkao@clkao.org> writes:

> Avoid dir_delta from calling apply_textdelta that is actually noop with
> ignore_ancestry is on. This makes many things work better in the case
> of two mutually merged branches, since editors now know the source and
> target stream are the same without having to construct the fulltext
> from the noop txdelta. svn diff the two branches will no longer show a
> lot of empty diffs.

Ah... I remember this IRC chat... :-)

> Eventually there could be a function to eliminate the identical data in
> the 'strings' for the mutually merged branches by using the same key to
> free up some pages.

This part doesn't belong in the log message. Save your hypothesizing
for the dev-list.

> * include/svn_fs.h
> (svn_fs_text_changed): New.

Hm. Nasty. Why did you add a new public FS interface which does
nothing but call other public FS interfaces and is used in exactly one
place in the code? Bad plan, homey. Your svn_fs_text_changed()
implementation should just be a helper function in delta.c.

But let's go back to the original problem you're trying to solve.
(And forgive me for forgetting what we decided on IRC.)

For files whose contents are the same, but where
svn_fs_contents_changed() doesn't catch that fact, we are
unnecessarily calculating an empty svndiff and transmitting that
across the wire. Editor implementors then go through the work of
reconstructing a fulltext that is the same as the text they already
have, and this is badness.

So, allow me to think aloud. And please correct me if I'm wrong. The
notes below assume that svn_fs_contents_changed() returns TRUE (which,
as we know, just means that the underlying DAGs point to different
rep-keys).

Today's code:

  Contents Are The Same:

           The code runs over the bytes of the two sets of contents,
           and calculates an empty diff. It then transmits that diff,
           which is applied on the client side.

  Contents Differ:

           The code runs over the bytes of the two sets of contents,
           and calculates a diff. It then transmits that diff, which
           is applied on the client side.

With your changes:

  Contents Are The Same:

           The code runs over the bytes of the two sets of contents,
           and determines that they are the same. No diff is sent,
           no diff is applied.

  Contents Differ:

           Best case is if the file sizes differ. In this case the
           code runs over the bytes of the two sets of contents, and
           calculates a diff. If the file sizes do not differ, the
           code runs over the bytes of the two sets of contents until
           a differing byte is found, and then runs back over the
           bytes again to calculate a diff. It then transmits that
           diff, which is applied on the client side.

I'm going to rank (based only on my gut instinct) and evaluate the
rareness of the three code forks from most rare to least:

  1. Contents differ, but have the same filesize.

      This change adds additional examination of the bytes of the
      files, which includes additional undeltification in order to
      read those bytes.

  2. Contents are the same.

      This change removes vdelta computation alone (doesn't save you a
      trip over the bytes, including undeltification to read the
      bytes). This change removes transmission of an insignificant
      amount of data (an empty delta) across the network. (You can't
      count the removal of the no-op fulltext reconstruction on the
      client side because an empty diff is trivially detectable -- our
      code could fix this right now with no server-side changes[1].)

  3. Contents have different file sizes.
      
      No real change from current behavior (aside from a pair of
      database hits per-file to get those sizes).

Well, it looks alright on paper, I guess. Borderline unnecessary code
churn, and I'd like to see some figures on how performance is
affected. I still say the new FS interface is bogus -- either make
svn_fs_contents_changed() tell the whole truth all the time (which
will effect *all* uses of dir_delta) or put that code into a delta.c
helper function.

-- C-Mike

[1] And I'm going to take a stab at doing so after sending this mail.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Nov 25 15:52:27 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.