Re: Making EXPANDED_SIZE reliable in FSFS [was: svn commit: r1673204 - /subversion/trunk/subversion/libsvn_fs_fs/dag.c]

From: Branko ÄŒibej <brane_at_wandisco.com>
Date: Mon, 13 Apr 2015 20:28:35 -0500

On 13.04.2015 18:04, Stefan Fuhrmann wrote:
> On Mon, Apr 13, 2015 at 11:20 PM, Daniel Shahaf
> <d.s_at_daniel.shahaf.name <mailto:d.s_at_daniel.shahaf.name>> wrote:
>
> rhuijben_at_apache.org <mailto:rhuijben_at_apache.org> wrote on Mon, Apr
> 13, 2015 at 14:28:39 -0000:
> > - *has_props = (noderev->prop_rep->expanded_size > 4);
> > + *has_props = (noderev->prop_rep->expanded_size > 4
> > + || (noderev->prop_rep->expanded_size == 0
> > + && noderev->prop_rep->size > 4));
>
> Having to remember, on every use of EXPANDED_SIZE, to check if
> it's zero
> is madness.
>
>
> The pain has been increasing and is now at a level
> where we should do something about it.
>
>
> Is there a good reason not to make the deserializer
> function handle this? If needed, by inventing a getter or a new
> struct
> member that doesn't have this quirk?
>
>
> The main problem is that we don't know whether SIZE is
> correct, either, without looking at the actual representation:
> It could be a self-delta of an empty fulltext. So, most of
> these tests are contextual, we e.g. know that these are
> property reps. Or we can allow for false positives or false
> negatives etc.
>
> One way to get around this problem is to basically use
> the approach of svn_fs_fs__file_length(), where we compare
> the contents checksum with with that of an empty rep.
> We might also restrict that check to "meaningful" values
> of "SIZE", i.e. those txdelta streams that produce an
> empty output.
>
> Assuming we find a truly reliable test (e.g. say txdelta is
> always 4 bytes and we can enumerate all MD5s), we could
> add it to the parser function in low_level.c.
>
> Thoughts?

I believe that currently a txdelta representation will always be at
least 4 bytes long; it's represented in svndiff format, which has a
4-byte magic header. Unless something has changed since I wrote that
part of the code years ago, we'll discard delta representations that are
larger than the fulltext, which implies that empty reps will always be
stored as fulltext.

I'd hate to rely on such heuristics in general, but for this specific
case, if they turn out to be valid ... and as long as we really
carefully document and cross-reference the assumptions ... and so on.

-- Brane
Received on 2015-04-14 03:29:08 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]