[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: r1397773 - rep-sharing in a txn - subversion/libsvn_fs_fs/fs_fs.c

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Thu, 15 Nov 2012 17:16:12 +0000 (GMT)

Hi Stefan.  Some review questions and comments...

stefan2_at_apache.org wrote:
> URL: http://svn.apache.org/viewvc?rev=1397773&view=rev
> Log:
> Due to public request: apply rep-sharing to equal data reps within
> the same transaction.
>
> The idea is simple. When writing a noderev to the txn folder,
> write another file named by the rep's SHA1 and store the rep
> struct in there. Lookup is then straight-forward.

Please
 could you update the FSFS structure document.  I assume it says what
files are stored in the txn dir; if it doesn't it should.

> * subversion/libsvn_fs_fs/fs_fs.c
>   (svn_fs_fs__put_node_revision): also look for SHA1-named files
>   (get_shared_rep): write SHA1-named files
[...]
> @@ -2603,7 +2607,32 @@ svn_fs_fs__put_node_revision(svn_fs_t *f
>
> -  return svn_io_file_close(noderev_file, pool);
> +  SVN_ERR(svn_io_file_close(noderev_file, pool));
> +
> +  /* if rep sharing has been enabled and the noderev has a data rep
> +   * and its SHA-1 is known, store the rep struct under its SHA1. */

It
 looks like we re-enter this 'if' block, and re-write the rep-reference
file, every time we store a node-rev with the same representation --
even if the rep we're storing was already found from this file.  This
might not be much of an overhead, but it seems ugly.

Could we
avoid re-writing it in those cases?  Maybe by refactoring such that we
write the rep-reference file immediately after writing the rep, instead
of here in put_node_revision.

> +  if (sha1)
> +   {
> +    apr_file_t *rep_file;
> +    const char *file_name = svn_dirent_join(path_txn_dir(fs, txn_id,\
> +                                            pool), sha1, pool);

Creating
 the file name ('checksum_to_cstring' + 'path_txn_dir' +
'svn_dirent_join') is common to both this function and the function
below where we read the file.  Factor it out, like the existing
path_txn_node_rev() and similar functions, to make the commonality
clear.

> +    const char *rep_string = representation_string(noderev->data_rep,
> +                                                   ffd->format,
> +                                                   (noderev->kind
> +                                                    == svn_node_dir),
> +                                                   FALSE,
> +                                                   pool);
> +    SVN_ERR(svn_io_file_open(&rep_file, file_name,
> +                             APR_WRITE | APR_CREATE | APR_TRUNCATE
> +                             | APR_BUFFERED, APR_OS_DEFAULT, pool));
> +
> +    SVN_ERR(svn_io_file_write_full(rep_file, rep_string,
> +                                   strlen(rep_string), NULL, pool));
> +
> +    SVN_ERR(svn_io_file_close(rep_file, pool));
> +   }
> +
> +  return SVN_NO_ERROR;
>  }
>
> @@ -7083,6 +7112,30 @@ get_shared_rep(representation_t **old_re
>
> +  /* look for intra-revision matches (usually data reps but not limited
> +     to them in case props happen to look like some data rep)
> +   */
> +  if (*old_rep == NULL && rep->txn_id)
> +    {
> +      svn_node_kind_t kind;
> +      const char *file_name
> +        = svn_dirent_join(path_txn_dir(fs, rep->txn_id, pool),
> +                          svn_checksum_to_cstring(rep->sha1_checksum,\
> +                                                  pool), pool);
> +
> +      /* in our txn, is there a rep file named with the wanted SHA1?
> +         If so, read it and use that rep. */
> +      SVN_ERR(svn_io_check_path(file_name, &kind, pool));

Instead
 of stat'ing the file to decide whether to open it, it's more efficient
and more 'robust' to just try to open it and then handle the potential
failure, isn't it?

> +      if (kind == svn_node_file)
> +        {
> +          svn_stringbuf_t *rep_string;
> +          SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name,\
> +                                           pool));

I'm
 sure the answer is obvious to those familiar with FSFS, but just to be clear please could you tell me which locking
mechanism guarantees that the opening and reading of this file cannot be
 happening at the same time as this file is being created and written
(by another thread or process)?

Is it possible that when this
file was being written, the writer crashed in such a way that the file
is now present but incomplete, and do we want to be able to recover --
that is, be able to re-open this txn and resume writing into it -- after
 such a failure?  If so, we should catch errors here that result from
reading incomplete data, and in such cases proceed as if the file was
not found; or else write the file atomically so that that can't happen.

> +          SVN_ERR(read_rep_offsets_body(old_rep, rep_string->data,
> +                                        rep->txn_id, FALSE, pool));
> +        }
> +    }
> +
>    /* Add information that is missing in the cached data. */
>    if (*old_rep)
>      {

- Julian
Received on 2012-11-15 18:16:52 CET

This is an archived mail posted to the Subversion Dev mailing list.