[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SHA-1 collision in repository?

From: Philip Martin <philip_at_codematters.co.uk>
Date: Fri, 02 Mar 2018 23:16:34 +0000

Myria <myriachan_at_gmail.com> writes:

> I just found out that the file causing the error from the large commit
> is not the large file - it's one of the smaller files, about 55 KB.
> If I commit that single smaller file from the large commit, it errors
> the same way as the original 227185 would. This is exactly like the
> original problem with committing the pixel shader.

If I understand correctly you are committing a single file using a
file:// URL and getting the error. If so then you may be able to
produce a much smaller testcase, see later.

> I managed to get the db/transactions/227184-4vb2.txn directory by
> breakpointing kernel32!DeleteFileW in TortoiseSVN (so I could get the
> contents before TortoiseSVN deleted them at failure). I don't know
> how they're useful, though.
>
> The only way I know how to proceed is to wait until the source code to
> TortoiseSVN is available so that I can debug it in Visual Studio. Is
> there something else I can do?

Are you able to share your repository, either in public or privately
with me? How big is the repository now it has fewer revisions?

The file:// commit does not have any cache from previous commits, unlike
svnsync or apache, so the error must involve data read explicitly during
the failed commit. That means the ancestors of the file being
committed, the parent directories, and possibly some other file (not an
ancestor) referenced by SHA1 in the rep-cache.

If you want to debug it yourself then as well as the transaction
directory db/transactions/227184-4vb2.txn/ there is also the protorev
file db/txn-protorevs/227184-4vb2.rev which contains the new content of
the committed files. When the content is sent by the client it gets
written as a delta to the txn-protorev file:

   DELTA
   SVN....
   ENDREP

Since the file being committed matched a SHA1 in the rep-cache the
commit process will attempt to remove this delta but will first verify
that the fulltext obtained by expanding the delta in the protorev file
matches the fulltext in the repository, see get_shared_rep() in
subversion/libsvn_fs_fs/transaction.c.

      /* Compare the two representations.
       * Note that the stream comparison might also produce MD5 checksum
       * errors or other failures in case of SHA1 collisions. */
      SVN_ERR(svn_fs_fs__get_contents_from_file(&contents, fs, rep, file,
                                                offset, scratch_pool));
      SVN_ERR(svn_fs_fs__get_contents(&old_contents, fs, &old_rep_norm,
                                      FALSE, scratch_pool));
      err = svn_stream_contents_same2(&same, contents, old_contents,
                                      scratch_pool);

Normally they compare equal and the protorev file is truncated to remove
the delta, but in your case they do not match and the commit fails.

As far as producing a smaller testcase: it may be possible to trim out
all the files and directories in other parts of the repository. For
example if the repository path to the parent directory of the commit is
/project/branch/foo/bar then you can use

   svndumpfilter include /project/branch/foo/bar

in an

   svnadmin dump ... | svndumpfilter ... | svnadmin load

pipeline to produce a smaller repository and this smaller repository may
reproduce the error. Dump and load tend to go faster with a larger
than default -M parameter.

There are some reasons why this may not work:

  - it may be necessary to expand the included tree to cope with copies.

  - the rep-cache might refer to a totally different file that happens
    to have the same SHA1/content, in which case the included tree may
    need to include this file as well.

  - the reduced repository will have the same files but the directories
    will be different/smaller and their content may be necessary to
    trigger the bug

  - the reduced repository will contain less data meaning smaller file
    offsets and the larger offsets may be necessary to trigger the bug

To determine whether the rep-cache SHA1 refers to a different file you
first need the repository form of the file being committed, i.e. with
svn:keywords and svn:eol-style detranslated. Then calculate the SHA1
and lookup the hash in the rep-cache:

 sqlite3 db/rep-cache.db "select revision from rep_cache where hash='xxxxxxx'"

This tells you which revision is involved, then you look in the revision
file db/revs/nnn/nnnnn to find the hash and determine the file path.

Thank you for persisting with the investigation!

-- 
Philip
Received on 2018-03-03 00:16:42 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.