[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r965892 - in /subversion/trunk: notes/dump-load-format.txt subversion/include/svn_repos.h subversion/libsvn_repos/dump.c

From: Stefan Sperling <stsp_at_elego.de>
Date: Thu, 22 Jul 2010 12:15:18 +0200

On Thu, Jul 22, 2010 at 09:53:35AM +0100, Philip Martin wrote:
> stsp_at_apache.org writes:
>
> > Author: stsp
> > Date: Tue Jul 20 16:14:53 2010
> > New Revision: 965892
> >
> > URL: http://svn.apache.org/viewvc?rev=965892&view=rev
> > Log:
> > Make svnadmin dump print headers containing MD5 and SHA1 checksums of
> > property content, as was already done for file content. Checksums are
> > printed for revision properties as well as versioned properties.
>
> Do we gain anything by having both MD5 and SHA1 checksums?

We print both kinds of checksum for file content, too, so it's just
for consistency.

> Do we need checksums per property?

Yes, because the idea is that the loader (or other tools handling dump
files) can identify properties that were corrupted, and let the user
know which properties were corrupted.

> Often the checksums will take up more space than the property.

Quite possibly. But the overhead is fixed in size, and it's all ASCII
so maybe it compresses quite well?

If you are concerned about space, there is already the --deltas option
which saves the bulk of the size of a regular v2 dump.
I haven't done any empirical analysis, but I'd suspect that deltas of
file content will proportionally save more space than the property hashes
can make up for.

> What about property names?

Hmmm... names aren't covered by checksums right now, that is true.
Maybe we should compute the hash over "key=value" strings, such as
"svn:eol-style=native"?

> Perhaps we could just have one checksum that includes all the property
> names and values?

I'd like loaders to be able to tell users which properties are
corrupted.

Note that I've added this in response to a complaint that dump files
carry very little information about their integrity (and that svnsync
does close to zero consistency checks, too, but that is a different site).
I've found out that we have checksums for content already in 1.6.x,
but then found out that properties don't have any checksums.
That's just unnecessary inconsistency. If we're going to have one,
we should have the other.

The checksums come before content, so the loader can easily be made to
verify content. It would make sense to make our own loader start verifying
checksums soon.

In general, having checksums built into the dump file itself does not
protect against purposeful corruption. It's just a safety net for accidental
corruption. For security purposes, people should cryptographically sign
the entire dump file.

Stefan
Received on 2010-07-22 12:16:01 CEST

This is an archived mail posted to the Subversion Dev mailing list.