[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

WC-NG: Commit with new pristine store and SHA-1 checksums

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Tue, 20 Apr 2010 18:37:30 +0100

On adapting the "commit" data flow to work with the new pristine text
store and SHA-1 checksums.

OBSERVATIONS
============

The call graph during a commit is (adapted from
notes/wc-ng/use-of-tmp-text-base-path):

                          svn_client_commit4()
                            |^[T] | |
  wc_to_repos_copy() |^[M] | |
            | |^ | |
        svn_client__do_commit() | |
                    [N] |^ | |
                        |^ | |
                        |^ | |
  LIBSVN_CLIENT |^ | |
  ......................................................................
  LIBSVN_WC |^ |
                        |^ svn_wc_queue_committed3()
                        |^
                        |^ |
  svn_wc_transmit_text_deltas3() |
            [N] |^ |
  svn_wc__internal_transmit_text_deltas() |
    [N] |^ |
        |^ |
        |^ svn_wc_process_committed_queue2()
        |^ |
        |^ svn_wc__process_committed_internal()
        |^ |
        |^ process_committed_leaf()
        |^ |^ |v
        |^ |^ |v
  svn_wc__text_base_path(tmp=TRUE) |v
                                      |v
         (Here the new text base is installed from the given path.)

The calling sequence is:

  svn_client__do_commit() calls
      svn_wc_transmit_text_deltas3(), once per modified file; then

  svn_client_commit4() calls
      svn_wc_queue_committed3(), once per significant node, to build a
queue; then svn_wc_process_committed_queue2(), just once, passing that
queue.

svn_wc_transmit_text_deltas3() does several things:

  - determine the new text base content by translating the
    working file to repository-normal form;
  - transmit deltas of that against the old text base;
  - verify the recorded checksum of the old text base;
  - optionally, store the new text base in a temporary file.

These purposes could be separated out a bit, although I don't think it's
particularly important to do so right now:

  - get a stream of the working text translated to RNF:
      svn_wc_translated_stream2()
  - get a stream of the old text base:
      svn_wc_get_pristine_contents2()
  - transmit deltas, given two readable streams:
      editor->apply_textdelta() f/b svn_txdelta_run()
  - write a stream to a new text base file:
      (We have old and new private APIs for this; I don't think we have
public ones.)

The tricky bit here is: after writing a new text-base file, that file's
path (old way) or SHA1 checksum (new way) needs to be communicated to
svn_wc_process_committed_queue(). The path isn't currently being
communicated, it's being re-derived.

The obvious way (1): Pass the list of new-text-base checksums on to
svn_wc_process_committed_queue(). That is relatively straightforward.
I need to check whether the Queue is already having a separate entry for
each and every modified file, and make sure it does.

Another possible way (2): If, in svn_wc_transmit_text_deltas3() or just
afterwards, we were to store the checksum in the ACTUAL_NODE table, in a
checksum field that represents the "Repo-Normal-Form of the ACTUAL text
which is currently being committed", then at commit post-processing time
we could get this checksum from the DB, knowing only the working file's
path, and write it into the BASE_NODE table. (We wouldn't rely on the
working file remaining untouched on disk, because we've stored a copy of
this checksummed text into the pristine store at the same time.)

What are the pros and cons? See backward compatibility, below.

THE NEW WAY
===========

This is a straightforward way to modify the new API.

Note: svn_wc_transmit_text_deltas3(), svn_wc_queue_committed3() and
svn_wc_process_committed_queue2() are already new in 1.7; their
predecessors must be kept working for backward compatibility.

svn_wc_transmit_text_deltas3() shall:
  - write the new text base into the pristine store rather than a
particular path;
  - return the SHA-1 checksum of the new text base;
  - no longer return the old "tempfile" and "md5_digest" outputs.

svn_wc_queue_committed3() shall:
  - take the SHA-1 checksum of every modified file *and every new file*
(instead of an MD-5 checksum).

svn_wc_process_committed_queue2() shall:
  - use the SHA-1 checksums found in the queue.

COMPATIBILITY
=============

We need to keep the old WC interface working:

  svn_wc_transmit_text_deltas2(&tempfile, &md5_digest, ...)
  svn_wc_queue_committed2(queue, path, ..., md5_checksum)
  svn_wc_process_committed_queue(queue, ...)

How? I can't see a way to communicate the SHA-1 checksum to
svn_wc_process_committed_queue() via the queue, but I can think of the
following ways.

(1)

An advantage of the method that stores the new checksum in the
ACTUAL_NODE table is that the backward-compatible old API can look there
to find the new text base:

svn_wc_transmit_text_deltas2() shall, if TEMPFILE is non-null:
    - store the new text base (with its checksums) in the pristine
store;
    - store the new text base's SHA-1 checksum in ACTUAL_NODE;
    - return the new text base's MD5 digest;
    - set *TEMPFILE to some path that it's safe for the caller to
attempt to delete, but that is not otherwise meaningful;

svn_wc_process_committed_queue() shall:
    - look in ACTUAL_NODE to find each file's new text base SHA-1;
    - "install" the new text base by simply writing that SHA-1 to
BASE_NODE.

(2)

If we use the simpler method (where the new API puts the new text in the
pristine store and passes only its SHA-1 checksum along), the only
solution I can find for the compatibility API is to keep working the old
way: put the temporary text base file at the old specially derivable
path, and then find it there within svn_wc_queue_committed2() or
svn_wc_process_committed_queue():

svn_wc_transmit_text_deltas2() shall, if TEMPFILE is non-null:
    - store the new text base at the special derived path;
    - set *TEMPFILE to that path;
    - return the new text base's MD5 digest.

svn_wc_process_committed_queue() shall:
    - find the new text base at the special derived path;
    - calculate its SHA-1 checksum;
    - store it (with its checksums) in the pristine store;
    - put that SHA-1 checksum in ACTUAL_NODE.

Comments please. Is either of those ways to be preferred?

One last thought: I haven't described here where an added file gets its
text base when it is committed. Of course its SHA-1 checksum needs to
be calculated and passed on too, similar to a modified file but not
using svn_wc_transmit_text_deltas3().

- Julian
Received on 2010-04-20 19:38:04 CEST

This is an archived mail posted to the Subversion Dev mailing list.