[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Merge mode (Was Classifying files as binary or text)

From: Mike Samuel <mikesamuel_at_gmail.com>
Date: Tue, 17 Nov 2009 20:23:38 -0800

What are the advantages of taking into account file path and mime-type
over having autoprops look at mime-type and squirrel information away
in the mime-type?

If file name is taken into account, how does diffing with ancestry
work when a file has been added as a result of an svn mv or svn cp?

2009/11/17 Julian Foad <julianfoad_at_btopenworld.com>:
> I need to spend some time replying, late at night though it is.
>
> Let me try to explain why I think a "how to merge" property should not
> be the primary indicator of how subversion should merge each file.
>
>
> Principle
> =========
>
> I have read that, in the realm of data handling, there is a principle
> that it is a bad idea to tag data with annotations that say what kind of
> actions can or should be performed on it. That kind of coupling is
> unscalable. Instead, it is better to tag data with an indication of what
> meaning and/or what syntax the data has, and then let tools decide what
> to do, based on that information.
>
> We already have one data-type indicator: svn:mime-type. Now, MIME type
> is far from a complete data type specifier. It is insufficient for our
> needs, in theory. However, in practice, it is nearly sufficient. (See
> Problem 2 below for an exception.)
>
> We also have another data-type indicator: the file name. A file name is
> also an incomplete source of metadata, and some file names ("README" or
> "CHANGES") give no indication at all of the format, but it is useful in
> many cases ("*.py", "*.c").
>
>
> Problem 1 (limited recognition of MIME types)
> =========
>
> Subversion mis-categorizes a lot of MIME types as "binary" (and
> therefore will not merge or diff or blame them) which really are
> line-based text formats.
>
> The list of such MIME types is continually evolving so it is not
> possible for Subversion to have a built-in complete list. However, it is
> easy for new releases of Subversion to have an updated list.
>
> It is not much harder for Subversion to have a configuarable list of
> which MIME types (or MIME type patterns) should be considered mergeable.
> (The configuration could be extensible: it could say line-wise-mergeable
> or not mergeable or XML-mergeable or ...)
>
>
> Problem 2 (mergeable and non-mergeable XML files)
> =========
>
> A user has some XML files on which a line-based merge is useful, and
> some XML files on which that is not useful, and wishes to label both
> kinds with svn:mime-type=text/xml. Let us suppose one format has each
> XML tag on a separate line, and the other has them all run together and
> line breaks inserted at arbitrary places in the sequence. It may be
> possible to find a different MIME type for one of the file types, but
> that may well not be possible, e.g. if they are both proprietary or
> arbitrary XML formats.
>
> This problem is not limited to XML files. Consider two "plain text"
> files, MIME type text/plain, with different kinds of text in them. One
> has line-based content, such as a shopping list, and changes usually
> leave many lines unchanged. The other contains the text of a newspaper
> article with line breaks at roughly every 70 characters in a stream of
> words, so two similar versions of it may have very few whole lines in
> common.
>
> I believe this is a real but relatively uncommon requirement. It is a
> genuine example of the MIME being insufficient to determine (line-wise)
> mergeability. There are many file formats that can be regarded as being
> (line-wise) mergeable or non-mergeable depending on some aspect of their
> content that cannot be reflected in the MIME type. It is uncommon in the
> sense that most Subversion users' needs can be satisfied by
> distinguishing mergeability based on the MIME type, or better the MIME
> type and file name taken together, of their files.
>
> To solve this problem when it exists, there does indeed need to be
> further metadata about the content type of the file. (Alternatively it
> could be metadata that says how to merge the file, but see "Principle"
> above.)
>
>
> Solution 0 (merge-mode)
> ==========
>
> So we could add a property to each file which says whether the file is
> to be considered line-wise mergeable by Subversion, and say that this
> property will be the primary source of this information. What are the
> pros and cons of this?

This seems to be similar to the proposal at the beginning of this thread.

> Pro: The user can force a line-wise merge on one file and no merge
> attempt on another file even when MIME type and file name are
> insufficient distinguishers.

> Pro: The user can forget about providing MIME type at all, and just set
> this property to one of the pre-defined two types of merging (line-based
> or none), if that is all the user cares about.

> Con: This property associates the file with one simple kind of merging;
> but the best merge tool available on the client may not be that simple
> kind. If we want to use a better merge tool, say an XML-aware merge
> tool, this property actually gets in the way: it tells us to use a
> simple line-based merge on this file. It would have been better if the
> property had said, "this file contains line-based XML, so you might want
> to use an XML-aware merge rather than a simple line-based merge if you
> can". In other words, we really want to tell the client what the content
> type is, and let the client choose the best merge tool for that content
> type.

The goal section of the proposal at the beginning of the thread
includes compatibility with future merge extensions.
I don't understand how it conflicts with that. Per Branko's
suggestion, we are using two values initially, but can extend the
value set to specify different merging schemes.

> Con: This property conveys redundant information. In almost all cases,
> the MIME type and/or file name are sufficient information. It is wrong
> to pretend that MIME type and file name are not good sources of
> how-to-merge information, and to leave their currently weak and
> deficient interpretation as just a deprecated backward-compatibility
> fallback.

"In almost all cases, ... are sufficient" sounds a lot like
"insufficient" to me.

> Con: Not extensible to diff, blame, etc. An indication that the file is
> line-wise-mergeable is not really a good indication of whether the file
> can be line-wise diffed or blamed.

Please provide an example of where line-wise mergable -/-> line-wise
diffable & line-wise blameable.

>
> Proposal
> ========
>
> This is the full, long-term proposal. We can choose a subset of this to
> do initially.
>
> (1) Make svn merge/diff/blame take into account the file name as well as
> the svn:mime-type in deciding whether to operate in a "line-wise" mode
> or not operate at all.
>
> (2) Update the built-in MIME type and filename patterns.
>
>  * Update the built-in selection based on svn:mime-type to recognise a
> list of MIME types that is reasonably up-to-date right now (even though
> it will be out of date by the time the released software is in use).

That sounds like a nice idea, but can be proposed and debated
separately from whether a new property, or filename information should
be taken into account. Can you start a separate thread on that?

>  * Update the built-in selection based on file names to recognize a
> reasonable list of file name patterns.
>
> (3) Provide a client-side config for extending and overriding the rules
> that map MIME type and file name to a merge/diff/blame mode. This mode
> should be specifiable in the config, not just "line-wise" or "none" but
> any other named mode. Provide config options for specifying the merge
> tool, diff tool and blame tool per mode. Tools should be specifiable as
> none, built-in or external.
>
> (4) Add an optional property for selecting a particular merge mode (and
> diff mode and blame mode) for the cases where (1) and (2) are
> insufficient or inconvenient.

If this proposal requires an extra property above and beyond filename,
can autoprops fill the gaps?

> Regards,
> - Julian
>
>
> Mike Samuel wrote:
>> Proposal:
>> ========
>> (1) Add documentation on the svn:merge-mode property that lists the
>> allowed values as ("simple" and "none")
>> (2) Add example autoprops rules to the documentation that sets
>> svn:merge-mode to "simple" for the following file types
>>     application/ecmascript
>>     application/json
>>     application/xml
>>     image/svg+xml
>> (3) Change the text quoted from the SVN manual under Background to
>> read as below.
>> (4) Update the implementation to agree.
>>
>>     Subversion treats the following files as [[mergable]]:
>>
>>         * Files with no svn:mime-type [[and no svn:merge-mode]]
>>         * Files with a svn:mime-type starting "text/"
>>         * Files with a svn:mime-type equal to "image/x-xbitmap"
>>         * Files with a svn:mime-type equal to "image/x-xpixmap"
>>         * [[Files with a svn:merge-mode that is equal to "simple"]]
>>
>>     All other files are treated as [[unmergeable]], meaning that
>> Subversion will:
>>
>>         * Not attempt to automatically merge received changes with
>> local changes during svn update or svn merge
>>         * Not show the differences as part of svn diff
>>         * Not show line-by-line attribution for svn blame
>>
>>     In all other respects, Subversion treats [[mergable]] files the
>> same as [[unmergeable]] files, e.g. if you set
>>     the svn:keywords or svn:eol-style properties, Subversion will
>> perform keyword substitution
>>     or newline conversion on [[unmergeable]] files.
>>
>>
>> Goal:
>> ====
>>   To update the scheme by which svn {update,diff,merge,blame} to allow
>> merging of files
>>   with svn:mime-type outside the hard-coded list currently used.
>>
>>   This determination should be independent of the platform svn
>>   is running on, so independent of the set of supported character sets.
>>
>>   This scheme should not complicate future extensions to the merge
>>   system which might wish to use a different merge policy, e.g. for XML
>>   than for source code files.
>>
>>   This scheme should work with autoprops, and other mechanisms repository
>>   administrators use to manage files.  Specifically, some kinds of XML can
>>   be meaningfully meged, and others cannot.
>>
>>   This scheme should work within existing limitations, such as the inability
>>   to merge UTF-16 and UTF-32.
>>
>>
>> Background:
>> ==========
>> The current behavior is described at
>> http://subversion.tigris.org/faq.html#binary-files
>>
>>     Subversion treats the following files as text:
>>
>>         * Files with no svn:mime-type
>>         * Files with a svn:mime-type starting "text/"
>>         * Files with a svn:mime-type equal to "image/x-xbitmap"
>>         * Files with a svn:mime-type equal to "image/x-xpixmap"
>>
>>     All other files are treated as binary, meaning that Subversion will:
>>
>>         * Not attempt to automatically merge received changes with
>> local changes during svn update or svn merge
>>         * Not show the differences as part of svn diff
>>         * Not show line-by-line attribution for svn blame
>>
>>     In all other respects, Subversion treats binary files the same as
>> text files, e.g. if you set
>>     the svn:keywords or svn:eol-style properties, Subversion will
>> perform keyword substitution
>>     or newline conversion on binary files.
>>
>> Common source code mime-types are misclassified, and that problem is
>> likely to grow because of current IANA policy.
>> Mime-types are handed out by the IANA, which only assigns text/*
>> mime-types for file-types that are meant to be human readable.  Source
>> code is explicitly not considered human readable.  This is why many
>> source code and data mime-types are in the application/* group or
>> other non text/* groups: application/json, application/ecmascript,
>> application/xml, image/svg+xml.
>> RFC 4288 ( ftp://ftp.rfc-editor.org/in-notes/rfc4288.txt ) says this
>>    Expected uses for the "application" media type
>>    include but are not limited to file transfer, spreadsheets,
>>    presentations, scheduling data, and languages for "active"
>>    (computational) material.
>>
>> ------------------------------------------------------
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2419155
>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2419302
Received on 2009-11-18 05:23:59 CET

This is an archived mail posted to the Subversion Dev mailing list.