[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[jira] [Commented] (SVN-4668) svnserve dump format order has changed

From: Luke Perkins <lukeperkins_at_epicdgs.us>
Date: Mon, 9 Jan 2017 08:09:41 -0800

From: Luke Perkins [mailto:lukeperkins_at_epicdgs.us]
Sent: Monday, January 9, 2017 06:33
Cc: 'jira_at_apache.org' <jira_at_apache.org>
Subject: RE: [jira] [Commented] (SVN-4668) svnserve dump format order has changed

I think the problem at hand is the following section of code located in libsvn_repos/dump.c starting at line #405. There was a significant rewrite to this section of code back in January of 2015 by a user "julianfoad". Still working on determining root-cause, however, it appears that a directive "Content-length must be last." Is the key-phrase that is reordering the SVN dump records format.

The old format order was:

1) Prop-content-length
2) Text-content-length
3) Text-content-sha1
4) Text-content-md5

Now the format order is:

1) Text-content-length
2) Text-content-sha1
3) Text-content-md5
4) Prop-content-length

/* Write headers, in arbitrary order.
 * ### TODO: use a stable order
 * ### Modifies HEADERS.
 */
static svn_error_t *
write_revision_headers(svn_stream_t *stream,
                       apr_hash_t *headers,
                       apr_pool_t *scratch_pool) {
  const char **h;
  apr_hash_index_t *hi;

  static const char *revision_headers_order[] =
  {
    SVN_REPOS_DUMPFILE_REVISION_NUMBER, /* must be first */
    NULL
  };

  /* Write some headers in a given order */
  for (h = revision_headers_order; *h; h++)
    {
      SVN_ERR(write_header(stream, headers, *h, scratch_pool));
      svn_hash_sets(headers, *h, NULL);
    }

  /* Write any and all remaining headers except Content-length.
   * ### TODO: use a stable order
   */
  for (hi = apr_hash_first(scratch_pool, headers); hi; hi = apr_hash_next(hi))
    {
      const char *key = apr_hash_this_key(hi);

      if (strcmp(key, SVN_REPOS_DUMPFILE_CONTENT_LENGTH) != 0)
        SVN_ERR(write_header(stream, headers, key, scratch_pool));
    }

  /* Content-length must be last */
  SVN_ERR(write_header(stream, headers, SVN_REPOS_DUMPFILE_CONTENT_LENGTH,
                       scratch_pool));

  return SVN_NO_ERROR;
}

Thank-you,

Luke Perkins

-----Original Message-----
From: Bert Huijben (JIRA) [mailto:jira_at_apache.org]
Sent: Monday, January 9, 2017 03:52
To: lukeperkins_at_epicdgs.us
Subject: [jira] [Commented] (SVN-4668) svnserve dump format order has changed

    [ https://issues.apache.org/jira/browse/SVN-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811591#comment-15811591 ]

Bert Huijben commented on SVN-4668:
-----------------------------------

One of the problems here is that we never explicitly coded dump to be in strict order. The code used to iterate the members of directories in the order they were placed in an APR hashtable. Then at one point APR changed its implementation from mostly stable to randomly changing to avoid attacks in certain usecases of hashtables. The dumpfiles were still valid at this point, but some operations might be in a different order. But all this technically produces 100% the same commits.

When we found this problem in Subversion in operations like 'svn status -U' we changed some parts of our code to start producing a strict stable order, but this new order is different than the one that used to be produced by the old apr hashtable implementation. I'm not sure if the replay api was (already) changed for this.

In Subversion 1.9, as part of optimizing fsfs the filesystem layer can now produce an 'optimal ordering' of members of a directory for cheap access on the filesystem layer... This might have changed the ordering again... and/or... change the ordering again in the future.

Other 1.9 work includes making the svnadmin dump format more stable between the different producers (svnadmin dump, svnrdump dump)

I'll try to add a few interesting issue numbers to this issue. But I think we should discuss this on the users or dev list first before proposing to 'fix' this, as I don't see a simple fix that works for all usecases.

> svnserve dump format order has changed
> --------------------------------------
>
> Key: SVN-4668
> URL: https://issues.apache.org/jira/browse/SVN-4668
> Project: Subversion
> Issue Type: Bug
> Components: svnserve
> Affects Versions: 1.9.3
> Environment: Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-53-generic x86_64)
> Reporter: Luke Perkins
> Attachments: SvnserveDumpIssue_20170107.jpg
>
>
> The format of the svnserve dump file has changed somewhere between version 1.8 and 1.9.3 ( version 1.9.3 (r1718519)). I routinely perform svnserve dump operations of my repositories and compare them against archived copies of dump files to be used for emergency recovery operations.
> It appears the content order difference is benign other than "diff" operations fail. I have file illustrating the difference.
> The version information for svnserve dump is:
> svnserve, version 1.9.3 (r1718519)
> compiled Mar 14 2016, 07:39:01 on x86_64-pc-linux-gnu Copyright (C)
> 2015 The Apache Software Foundation.
> This software consists of contributions made by many people; see the
> NOTICE file for more information.
> Subversion is open source software, see http://subversion.apache.org/
> The following repository back-end (FS) modules are available:
> * fs_fs : Module for working with a plain file (FSFS) repository.
> * fs_x : Module for working with an experimental (FSX) repository.
> * fs_base : Module for working with a Berkeley DB repository.
> Cyrus SASL authentication is available.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Received on 2017-01-10 07:42:35 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.