Re: How use patch_target_t for both files and (possibly multiple) properties

From: Stefan Sperling <stsp_at_elego.de>
Date: Mon, 21 Jun 2010 11:12:43 +0200

On Sun, Jun 20, 2010 at 09:53:29PM +0200, Daniel Näslund wrote:
> Hi stsp!
>
> Attaching a patch showing a suggestion on how to split up
> patch_target_t. Maybe you have some insights on this, before I rush out
> and start grooking around and making changes to your design.
>
> What 'svn patch' does
> ----------------------
> 1) Match hunks
> 2) Apply hunks to a stream backed up by a tmp file
> 3) Install. copy the tmp file to the target location
> 4) Notify the user on what has happened, e.g. fuzz, offset and rejects
>
> The problem
> -------------
> Each property is essentially like it's own text file. 'svn patch' must
> be generic enough to be able to perform the above actions on both
> properties and texts.
>
> Alternatives
> --------------
> * Simplify. Don't use the unidiff format but just copy the content
> straight off and apply it.

I don't think I understand the above. What do you mean?

> * Duplicate. Create special code for handling props, e.g. have both
> match_hunk() and match_property_hunk() and so on.
>
> Proposed solution
> -------------------
> Use patch_target_t as a container for sub-patch_targets that can be either
> file-text or a property. Store tree change info needed for step 3)
> installing and step 4) notifying.

Your description is quite low-level (detailing new fields in existing
structs etc.) which makes it a bit hard to follow. Let's take a step back.
What is the general idea of your approach?

I'll try making some high-levelI suggestions about how to approach the
problem. Please comment on them (I am most likely not seeing the whole
picture) and we'll work something out.

parse_next_hunk() currently puts both normal and property hunks
in the same list (patch->hunks). Have you considered splitting this into
two separate lists, e.g. patch->hunks and patch->property_hunks?

property_hunks can be a hashtable with prop_name as keys and
arrays of hunk_t as value: {prop_name, [hunk1, hunk2, hunk3, ...]}.
Or it can be a list of structs like { prop_name; array of hunks; }
if that is easier to work with in the caller.

This should provide much cleaner separation, because now you can
leave the original hunk handling code alone, and add extra steps
that handle the property cases.

Regarding parse_next_hunk(), consider splitting this up into two
functions: parse_hunk() and parse_property_hunk().
The first one attempts to find a normal "@@" hunk at the current file
offset, and returns a NULL *hunk on parsing failure. If it fails
(the hunk returned is NULL), rewind the file to the seek position
where parsing a normal hunk failed, and parse_property_hunk() to
attempt parsing a "##" property hunk.
These functions could probably share some code, but you should keep
them independent for now so we can see how much they really differ.

In the patch code, you can then add property patching as a separate step.
Currently, we do something like the following in apply_one_patch():

for hunk in hunks:
  match(hunk)
  # gather some additional meta data on hunk such as actual line
  # offset we'll be using
  hunk_info = get_info(hunk)
  target.hunks.append(hunk_info)

for hunk_info in target.hunks:
  if hunk.rejected:
    reject(hunk)
  else
    apply(hunk)

Rejection and application happens on temporary files, we haven't
modified the working copy yet.

It's clear that you'll need similar steps for properties. Something like:

for (prop, hunks) in property_hunks:
  target.properties.append(prop)
  for hunk in hunks:
    match(hunk)
    # gather some additional meta data on hunk such as actual line
    # offset we'll be using
    hunk_info = get_info(hunk)
    target.properties.prop.hunks.append(hunk_info)

for prop in target.properties:
  for hunk in prop.hunks:
    if hunk.rejected:
      reject(hunk)
    else
      apply(hunk)

This would also operate entirely on temporary files.
Once we're here, we can worry about applying the changes to the WC.

Again, I think it would help if you kept all property steps separate
from the existing code in the beginning. Just add new steps to
apply_one_patch() as needed, but don't modify existing code.
We can worry about optimizing the integration of the new code with the
existing code later. It's more important to focus on the problem itself
rather than worrying how to best integrate it with the existing code.

Just keep in mind to release all resources allocated for a target
before processing the next target. This is really important because
otherwise we use too much memory and/or file descriptors with large
patches.

Stefan
Received on 2010-06-21 11:13:25 CEST

This message: [ Message body ]
Next message: Bert Huijben: "RE: 1.6.12 tarballs up for signing / testing"
Previous message: Julian Foad: "Re: Why am I getting a bad merge?"
In reply to: Daniel Näslund: "How use patch_target_t for both files and (possibly multiple) properties"
Next in thread: Daniel Näslund: "Re: How use patch_target_t for both files and (possibly multiple) properties"
Reply: Daniel Näslund: "Re: How use patch_target_t for both files and (possibly multiple) properties"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]