[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn blame not working for files which had binary mime-type in a previous revision

From: Philip Martin <philip.martin_at_wandisco.com>
Date: Tue, 12 Feb 2013 18:29:35 +0000

I'm still not clear what would go wrong.

Each binary files get treated as one or more long lines. If there are
multiple versions of the binary we will calculate diffs between these
lines. The lines and the diffs are more or less meaningless. At some
point the file becomes text and replaces all the binary lines. At that
point the blame algorithm will discard the meaningless diffs and we get
the correct output.

What would go wrong? Is the diff algorithm going to fall over on very
long lines? I'm not aware of that being a limitation. I suppose the
binary files might trigger the problem case where the minimal diff takes
a really long time to run. We get that problem on text files, typically
machine generated, but I suppose some binary files might also trigger
it. Is that going to be a significant problem?

Bert Huijben <bert_at_qqmail.nl> writes:

> This might not work as good as expected, as our blame always works old to
> new... so we will try to blame the elf binary *before* the shell script.
>
> (and this isn’t easy to fix because the ra layer currently can only deliver
> the file versions in that direction)
>
> A patch to libsvn_repos would be very welcome to allow switching to a more
> efficient walk in a future subversion version.
>
> Bert
>
> *From:* Philip Martin <philip.martin_at_wandisco.com>
> *Sent:* February 12, 2013 2:32 PM
> *To:* Ferenc Kovacs <tyra3l_at_gmail.com>
> *CC:* dev_at_subversion.apache.org
> *Subject:* Re: svn blame not working for files which had binary mime-type
> in a previous revision
>
> Philip Martin <philip.martin_at_wandisco.com> writes:
>
>> Stefan Sperling <stsp_at_elego.de> writes:
>>
>>> OK, I agree that it might not be obvious to someone who doesn't know
>>> how blame actually works internally. It works by incrementally diffing
>>> all revisions that changed the file to figure out which revision
>>> contributed which line. Since a binary file doesn't have a notion
>>> of what a 'line' really is, this approach doesn't work for binary
>>> files. Neither does it work if one or more revisions contain binary
>>> content.
>>
>> What exactly goes wrong? The current revision is text, not binary, and
>> the final output is the current file. No part of the binary file gets
>> into the final output.
>>
>> Suppose I have a file that really was binary in the past, perhaps a
>> shell script that used to be an ELF binary. When blame reaches the
>> binary revision the binary data is likely to get treated as one or more
>> lines of text, none of which match the current text. At that point the
>> blame algorithm is complete. Isn't that the right answer?
>>
>> I see I asked this question in the original thread but I don't see any
>> answer.
>
> Since there appears to be no reason to check the mime-type of anything
> other than the final output I made blame behave that way in r1445164.
>
> --
> Certified & Supported Apache Subversion Downloads:
> http://www.wandisco.com/subversion/download

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download
Received on 2013-02-12 19:30:16 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.