Re: How use patch_target_t for both files and (possibly multiple) properties

From: Daniel Näslund <daniel_at_longitudo.com>
Date: Mon, 21 Jun 2010 13:16:38 +0200

On Mon, Jun 21, 2010 at 11:12:43AM +0200, Stefan Sperling wrote:
> On Sun, Jun 20, 2010 at 09:53:29PM +0200, Daniel Näslund wrote:
> > Hi stsp!
> >
> > Attaching a patch showing a suggestion on how to split up
> > patch_target_t. Maybe you have some insights on this, before I rush out
> > and start grooking around and making changes to your design.
> >
> > What 'svn patch' does
> > ----------------------
> > 1) Match hunks
> > 2) Apply hunks to a stream backed up by a tmp file
> > 3) Install. copy the tmp file to the target location
> > 4) Notify the user on what has happened, e.g. fuzz, offset and rejects
> >
> > The problem
> > -------------
> > Each property is essentially like it's own text file. 'svn patch' must
> > be generic enough to be able to perform the above actions on both
> > properties and texts.
> >
> > Alternatives
> > --------------
> > * Simplify. Don't use the unidiff format but just copy the content
> > straight off and apply it.
>
> I don't think I understand the above. What do you mean?

I was just thinking: We usually have properties that are limited in
size. We could just put the whole contents of the property after a header
in the diff file, e.g. not use the unidiff format. Then we could ignore
step 1) and 2) and just install the property and do the notification
according to what the diff header said, e.g added, deleted or modified.
I'm not proposing that we do it like that, but it's just that using
diffs for something that is typically one line or one word is a lot of
overhead. *But*, the properties can be longer; Tortoisesvn has a
tsvn:logtemplate property for instance.

> > * Duplicate. Create special code for handling props, e.g. have both
> > match_hunk() and match_property_hunk() and so on.

What you're proposing below is to duplicate the matching and applying of
properties. My thought was that we've put in a lot of thoughts to the
matching and applying and have encountered quite a few edge cases
involving newlines, fuzz and offsets. I was just hoping that we could
reuse our existing code.

> > Proposed solution
> > -------------------
> > Use patch_target_t as a container for sub-patch_targets that can be either
> > file-text or a property. Store tree change info needed for step 3)
> > installing and step 4) notifying.
>
> Your description is quite low-level (detailing new fields in existing
> structs etc.) which makes it a bit hard to follow.

I agree on that when rereading my mail, what did I really mean? :)

> Let's take a step back. What is the general idea of your approach?

To be able to use properties and text diffs in the same way when
matching and applying hunks.

> I'll try making some high-levelI suggestions about how to approach the
> problem. Please comment on them (I am most likely not seeing the whole
> picture) and we'll work something out.
>
> parse_next_hunk() currently puts both normal and property hunks
> in the same list (patch->hunks). Have you considered splitting this into
> two separate lists, e.g. patch->hunks and patch->property_hunks?

Since r955844, we put property hunks in patch->property_hunks and text
hunks in patch->hunks. :)

> property_hunks can be a hashtable with prop_name as keys and
> arrays of hunk_t as value: {prop_name, [hunk1, hunk2, hunk3, ...]}.

Since r955844, that is the approach.

> Or it can be a list of structs like { prop_name; array of hunks; }
> if that is easier to work with in the caller.
>
> This should provide much cleaner separation, because now you can
> leave the original hunk handling code alone, and add extra steps
> that handle the property cases.
>
> Regarding parse_next_hunk(), consider splitting this up into two
> functions: parse_hunk() and parse_property_hunk().
> The first one attempts to find a normal "@@" hunk at the current file
> offset, and returns a NULL *hunk on parsing failure. If it fails
> (the hunk returned is NULL), rewind the file to the seek position
> where parsing a normal hunk failed, and parse_property_hunk() to
> attempt parsing a "##" property hunk.
> These functions could probably share some code, but you should keep
> them independent for now so we can see how much they really differ.

I considered using that approach but thought that seeking around would
be messy. I didn't have to add too many if statements [1]. I reused the
parse code by allowing atat to be either '@@' or '##'. Maybe I was a bit
too enthusiastic and committed my changes too quickly. I'm able to use
parse_next_hunk() for both properties and hunks but maybe it just
obfuscates the code.

> In the patch code, you can then add property patching as a separate step.
> Currently, we do something like the following in apply_one_patch():
>
> for hunk in hunks:
> match(hunk)
> # gather some additional meta data on hunk such as actual line
> # offset we'll be using
> hunk_info = get_info(hunk)
> target.hunks.append(hunk_info)
>
> for hunk_info in target.hunks:
> if hunk.rejected:
> reject(hunk)
> else
> apply(hunk)
>
> Rejection and application happens on temporary files, we haven't
> modified the working copy yet.
>
> It's clear that you'll need similar steps for properties. Something like:
>
> for (prop, hunks) in property_hunks:
> target.properties.append(prop)
> for hunk in hunks:
> match(hunk)
> # gather some additional meta data on hunk such as actual line
> # offset we'll be using
> hunk_info = get_info(hunk)
> target.properties.prop.hunks.append(hunk_info)
>
> for prop in target.properties:
> for hunk in prop.hunks:
> if hunk.rejected:
> reject(hunk)
> else
> apply(hunk)

Yeah exactly. What I wanted was for the matching part
(get_hunk_info(), scan_for_match(), match_hunk(), read_line()) to be
able to be used by both text hunks and property hunks and the same for
apply_hunk() and reject_hunk().

> This would also operate entirely on temporary files.
> Once we're here, we can worry about applying the changes to the WC.

Yes, and by 'applying changes to the WC' I assume that you mean what
happens in install_patched_target().

> Again, I think it would help if you kept all property steps separate
> from the existing code in the beginning. Just add new steps to
> apply_one_patch() as needed, but don't modify existing code.
> We can worry about optimizing the integration of the new code with the
> existing code later. It's more important to focus on the problem itself
> rather than worrying how to best integrate it with the existing code.

That is a good advice. At the moment I can duplicate as much code as I
want. When the whole thing works and we have good test coverage, we can
look at how to remove duplicate code. Or are you just trying to make it
easier to later remove my code, just like what happended to the old
svnpatch code last year? :P

> Just keep in mind to release all resources allocated for a target
> before processing the next target. This is really important because
> otherwise we use too much memory and/or file descriptors with large
> patches.

Thanks for all your feedback,
Daniel
[1]http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_diff/parse-diff.c?r1=947432&r2=955844&pathrev=955844&diff_format=h
Received on 2010-06-21 13:17:30 CEST

This message: [ Message body ]
Next message: Hyrum Wright: "Subversion 1.6.12 Released"
Previous message: Bert Huijben: "RE: 1.6.12 tarballs up for signing / testing"
In reply to: Stefan Sperling: "Re: How use patch_target_t for both files and (possibly multiple) properties"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]