[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Fwd: SHA-1 collision in repository?

From: Myria <myriachan_at_gmail.com>
Date: Mon, 5 Mar 2018 18:56:59 -0800

GMail keeps doing reply instead of reply all. I'm having to manually
add the users list back now.

Below is the thread I sent.

---------- Forwarded message ----------
From: Myria <myriachan_at_gmail.com>
Date: Mon, Mar 5, 2018 at 6:37 PM
Subject: Re: SHA-1 collision in repository?
To: Philip Martin <philip_at_codematters.co.uk>

I now know where the checksum error happens, but not why.

svn: E200014: Checksum mismatch while reading representation:
   expected: bb52be764a04d511ebb06e1889910dcf
     actual: 80a10d37de91cadc604ba30e379651b3

It's calculating the MD5 of only the first 16 KB of the input file and
comparing against the MD5 of the entire file. The 16 KB number seems
to be SVN__STREAM_CHUNK_SIZE.

bb52be764a04d511ebb06e1889910dcf is the MD5 of the entire file.
80a10d37de91cadc604ba30e379651b3 is the MD5 of the first 16384 bytes.

On Mon, Mar 5, 2018 at 5:23 PM, Myria <myriachan_at_gmail.com> wrote:
> I managed to compile a subversion command line client with debugging
> information and optimizations disabled, and can reproduce the problem
> with GDB attached.
>
> Here is a backtrace at the time at which the error occurs. A few line
> numbers in stream.c will be wrong by a few lines due to a few printf's
> I added.
>
> #0 svn_checksum_mismatch_err (expected=0x7ffffffdcf00,
> actual=0x7ffffa0700a0, scratch_pool=0x7ffffa070028,
> fmt=0x7ffffc259ac0 "Checksum mismatch while reading
> representation") at subversion/libsvn_subr/checksum.c:638
> #1 0x00007ffffc2123de in rep_read_contents (baton=0x7ffffa1f6190,
> buf=0x7ffffa1f66a8 "// <redacted>"..., len=0x7ffffffdcf88)
> at subversion/libsvn_fs_fs/cached_data.c:2062
> #2 0x00007ffffe5645fd in svn_stream_read_full (stream=0x7ffffa1f6470,
> buffer=0x7ffffa1f66a8 "// <redacted>"..., len=0x7ffffffdcf88)
> at subversion/libsvn_subr/stream.c:193
> #3 0x00007ffffe5653f3 in svn_stream_contents_same2
> (same=0x7ffffffdd01c, stream1=0x7ffffa1f6470,
> stream2=0x7ffffa1f6650, pool=0x7ffffa1e0028) at
> subversion/libsvn_subr/stream.c:589
> #4 0x00007ffffc247226 in get_shared_rep (old_rep=0x7ffffffdd188,
> fs=0x7fffff601030, rep=0x7ffffa0e20b8,
> file=0x7ffffa1e0390, offset=0, reps_hash=0x0,
> result_pool=0x7fffff5e0028, scratch_pool=0x7ffffa1e0028)
> at subversion/libsvn_fs_fs/transaction.c:2280
> #5 0x00007ffffc247734 in rep_write_contents_close
> (baton=0x7ffffa232ff0) at subversion/libsvn_fs_fs/transaction.c:2370
> #6 0x00007ffffe56492b in svn_stream_close (stream=0x7ffffa233140) at
> subversion/libsvn_subr/stream.c:274
> #7 0x00007ffffe841001 in apply_window (window=0x0,
> baton=0x7ffffa1000a0) at subversion/libsvn_delta/text_delta.c:732
> #8 0x00007ffffc2520d2 in window_consumer (window=0x0,
> baton=0x7fffff5f1ab8) at subversion/libsvn_fs_fs/tree.c:2935
> #9 0x00007ffffe8405ef in svn_txdelta_run (source=0x7fffff5f1a18,
> target=0x7fffff5f1298,
> handler=0x7ffffc25209f <window_consumer>,
> handler_baton=0x7fffff5f1ab8, checksum_kind=svn_checksum_md5,
> checksum=0x7ffffffdd458, cancel_func=0x0, cancel_baton=0x0,
> result_pool=0x7fffff5e0028,
> scratch_pool=0x7fffff5e0028) at subversion/libsvn_delta/text_delta.c:454
> #10 0x00007ffffee98a57 in svn_wc__internal_transmit_text_deltas (tempfile=0x0,
> new_text_base_md5_checksum=0x7ffffffdd5b0,
> new_text_base_sha1_checksum=0x7ffffffdd5b8, db=0x7fffff6c17d8,
> local_abspath=0x7fffff672d08
> "/mnt/d/svntest/repository/directory/Redacted.cpp",
> fulltext=0, editor=0x7fffff673700, file_baton=0x7fffff510110,
> result_pool=0x7fffff6c0028,
> scratch_pool=0x7fffff5e0028) at subversion/libsvn_wc/adm_crawler.c:1109
> #11 0x00007ffffee98d68 in svn_wc_transmit_text_deltas3
> (new_text_base_md5_checksum=0x7ffffffdd5b0,
> new_text_base_sha1_checksum=0x7ffffffdd5b8, wc_ctx=0x7fffff6c17c0,
> local_abspath=0x7fffff672d08
> "/mnt/d/svntest/repository/directory/Redacted.cpp",
> fulltext=0, editor=0x7fffff673700, file_baton=0x7fffff510110,
> result_pool=0x7fffff6c0028,
> scratch_pool=0x7fffff5e0028) at subversion/libsvn_wc/adm_crawler.c:1199
> #12 0x00007fffff18eb12 in svn_client__do_commit (
> base_url=0x7fffff6142c0 "file:///mnt/d/svntest/repository/directory",
> commit_items=0x7fffff672c48, editor=0x7fffff673700,
> edit_baton=0x7fffff6300a0,
> notify_path_prefix=0x7fffff672900 "/mnt/d/svntest/repository",
> sha1_checksums=0x7ffffffdd750,
> ctx=0x7fffff6c16f0, result_pool=0x7fffff6c0028, scratch_pool=0x7fffff650028)
> at subversion/libsvn_client/commit_util.c:1920
> #13 0x00007fffff18a5f9 in svn_client_commit6 (targets=0x7fffff670a18,
> depth=svn_depth_infinity, keep_locks=0,
> keep_changelists=0, commit_as_operations=1,
> include_file_externals=0, include_dir_externals=0,
> changelists=0x7fffff6c0780, revprop_table=0x0,
> commit_callback=0x42c6a0 <svn_cl__print_commit_info>,
> commit_baton=0x0, ctx=0x7fffff6c16f0, pool=0x7fffff6c0028) at
> subversion/libsvn_client/commit.c:901
> #14 0x000000000040b744 in svn_cl__commit (os=0x7fffff6c0520,
> baton=0x7ffffffddc60, pool=0x7fffff6c0028)
> at subversion/svn/commit-cmd.c:171
> #15 0x000000000042b351 in sub_main (exit_code=0x7ffffffddf3c, argc=5,
> argv=0x7ffffffde038, pool=0x7fffff6c0028)
> at subversion/svn/svn.c:3041
> #16 0x000000000042b5ee in main (argc=5, argv=0x7ffffffde038) at
> subversion/svn/svn.c:3126
>
> On Fri, Mar 2, 2018 at 3:16 PM, Philip Martin <philip_at_codematters.co.uk> wrote:
>> Myria <myriachan_at_gmail.com> writes:
>>
>>> I just found out that the file causing the error from the large commit
>>> is not the large file - it's one of the smaller files, about 55 KB.
>>> If I commit that single smaller file from the large commit, it errors
>>> the same way as the original 227185 would. This is exactly like the
>>> original problem with committing the pixel shader.
>>
>> If I understand correctly you are committing a single file using a
>> file:// URL and getting the error. If so then you may be able to
>> produce a much smaller testcase, see later.
>>
>>> I managed to get the db/transactions/227184-4vb2.txn directory by
>>> breakpointing kernel32!DeleteFileW in TortoiseSVN (so I could get the
>>> contents before TortoiseSVN deleted them at failure). I don't know
>>> how they're useful, though.
>>>
>>> The only way I know how to proceed is to wait until the source code to
>>> TortoiseSVN is available so that I can debug it in Visual Studio. Is
>>> there something else I can do?
>>
>> Are you able to share your repository, either in public or privately
>> with me? How big is the repository now it has fewer revisions?
>>
>>
>>
>> The file:// commit does not have any cache from previous commits, unlike
>> svnsync or apache, so the error must involve data read explicitly during
>> the failed commit. That means the ancestors of the file being
>> committed, the parent directories, and possibly some other file (not an
>> ancestor) referenced by SHA1 in the rep-cache.
>>
>> If you want to debug it yourself then as well as the transaction
>> directory db/transactions/227184-4vb2.txn/ there is also the protorev
>> file db/txn-protorevs/227184-4vb2.rev which contains the new content of
>> the committed files. When the content is sent by the client it gets
>> written as a delta to the txn-protorev file:
>>
>> DELTA
>> SVN....
>> ENDREP
>>
>> Since the file being committed matched a SHA1 in the rep-cache the
>> commit process will attempt to remove this delta but will first verify
>> that the fulltext obtained by expanding the delta in the protorev file
>> matches the fulltext in the repository, see get_shared_rep() in
>> subversion/libsvn_fs_fs/transaction.c.
>>
>> /* Compare the two representations.
>> * Note that the stream comparison might also produce MD5 checksum
>> * errors or other failures in case of SHA1 collisions. */
>> SVN_ERR(svn_fs_fs__get_contents_from_file(&contents, fs, rep, file,
>> offset, scratch_pool));
>> SVN_ERR(svn_fs_fs__get_contents(&old_contents, fs, &old_rep_norm,
>> FALSE, scratch_pool));
>> err = svn_stream_contents_same2(&same, contents, old_contents,
>> scratch_pool);
>>
>> Normally they compare equal and the protorev file is truncated to remove
>> the delta, but in your case they do not match and the commit fails.
>>
>>
>>
>> As far as producing a smaller testcase: it may be possible to trim out
>> all the files and directories in other parts of the repository. For
>> example if the repository path to the parent directory of the commit is
>> /project/branch/foo/bar then you can use
>>
>> svndumpfilter include /project/branch/foo/bar
>>
>> in an
>>
>> svnadmin dump ... | svndumpfilter ... | svnadmin load
>>
>> pipeline to produce a smaller repository and this smaller repository may
>> reproduce the error. Dump and load tend to go faster with a larger
>> than default -M parameter.
>>
>> There are some reasons why this may not work:
>>
>> - it may be necessary to expand the included tree to cope with copies.
>>
>> - the rep-cache might refer to a totally different file that happens
>> to have the same SHA1/content, in which case the included tree may
>> need to include this file as well.
>>
>> - the reduced repository will have the same files but the directories
>> will be different/smaller and their content may be necessary to
>> trigger the bug
>>
>> - the reduced repository will contain less data meaning smaller file
>> offsets and the larger offsets may be necessary to trigger the bug
>>
>> To determine whether the rep-cache SHA1 refers to a different file you
>> first need the repository form of the file being committed, i.e. with
>> svn:keywords and svn:eol-style detranslated. Then calculate the SHA1
>> and lookup the hash in the rep-cache:
>>
>> sqlite3 db/rep-cache.db "select revision from rep_cache where hash='xxxxxxx'"
>>
>> This tells you which revision is involved, then you look in the revision
>> file db/revs/nnn/nnnnn to find the hash and determine the file path.
>>
>>
>> Thank you for persisting with the investigation!
>>
>> --
>> Philip
Received on 2018-03-06 03:57:12 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.