[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Trival merge of big text file: Dismal performance, 540x faster if binary.

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Fri, 21 Jan 2011 10:19:07 +0100

On Mon, Jan 17, 2011 at 5:30 PM, krueger, Andreas (Andreas Krüger,
DV-RATIO) <andreas.krueger_at_hp.com> wrote:
> Hello, Daniel and all,
>
>> In other words, merging changes from file.c_at_BRANCH to trunk should
>> detect that file_at_trunk and file_at_BRANCH@BRANCH-CREATION are the same
>> node-revision?
>
> I think that my intended optimization should work in this case.
> But I don't think that's the condition I mean.
>
> It does not feel general enough.
>
> But then, I also don't think this has to be discussed in the context
> of this optimization proposal.  After all, there is some condition
> already implemented.  SVN already knows how to check
> whether a merge is possible or not in the binary case.
>
> That condition IS what I want.
>
> If a binary merge would be possible, be fast and do the binary merge
> and don't bother with text diffs.
>
>
>> but giving the question more
>> visibility (as opposed to burying it in the middle of a paragraph on
>> users@) might help you get an answer. :-)
>
> Thanks for the hint!
>
> I'd be more than willing to convert this to an issue at
> http://subversion.tigris.org/issues .
>
> Writing a self-contained script that demonstrates the performance
> problem (starting with the creation of a scratch SVN repository) -
> would that be a good next step?

Hi Andreas,

Yes, I think you should probably file an issue for this in the issue
tracker, referring to this thread. If you could write a self-contained
script to demonstrate, that would certainly be a good thing.

Just to confirm your hypothesis about the special shortcut in "svn
merge" for binary files, here is the relevant excerpt from
subversion/libsvn_client/merge.c (starting at line 1454):

[[[
      /* Special case: if a binary file's working file is
         exactly identical to the 'left' side of the merge, then don't
         allow svn_wc_merge to produce a conflict. Instead, just
         overwrite the working file with the 'right' side of the
         merge. Why'd we check for local mods above? Because we want
         to do a different notification depending on whether or not
         the file was locally modified.

         Alternately, if the 'left' side of the merge doesn't exist in
         the repository, and the 'right' side of the merge is
         identical to the WC, pretend we did the merge (a no-op). */
      if ((mimetype1 && svn_mime_type_is_binary(mimetype1))
          || (mimetype2 && svn_mime_type_is_binary(mimetype2)))
        {
          /* For adds, the 'left' side of the merge doesn't exist. */
          svn_boolean_t older_revision_exists =
              !merge_b->add_necessitated_merge;
          svn_boolean_t same_contents;
          SVN_ERR(svn_io_files_contents_same_p(&same_contents,
                                               (older_revision_exists ?
                                                older_abspath : yours_abspath),
                                               mine_abspath, subpool));
          if (same_contents)
            {
              if (older_revision_exists && !merge_b->dry_run)
                {
                  SVN_ERR(svn_io_file_move(yours_abspath, mine_abspath,
                                           subpool));
                }
              merge_outcome = svn_wc_merge_merged;
              merge_required = FALSE;
            }
        }
]]]

See also:
http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_client/merge.c?view=markup

That said, I'm not so sure that we could blindly take this same
shortcut for text files. It sounds like a trivial decision, but there
might be some hidden problems if we do this. We just need to be
careful, and think this through ...

Cheers,

-- 
Johan
Received on 2011-01-21 10:20:05 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.