[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: r1397773 - rep-sharing in a txn - subversion/libsvn_fs_fs/fs_fs.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Mon, 19 Nov 2012 15:43:08 +0100

On Thu, Nov 15, 2012 at 6:16 PM, Julian Foad <julianfoad_at_btopenworld.com>wrote:

> Hi Stefan. Some review questions and comments...
>
> stefan2_at_apache.org wrote:
> > URL: http://svn.apache.org/viewvc?rev=1397773&view=rev
> > Log:
> > Due to public request: apply rep-sharing to equal data reps within
> > the same transaction.
> >
> > The idea is simple. When writing a noderev to the txn folder,
> > write another file named by the rep's SHA1 and store the rep
> > struct in there. Lookup is then straight-forward.
>
> Please
> could you update the FSFS structure document. I assume it says what
> files are stored in the txn dir; if it doesn't it should.
>

Oops - hadn't thought about that! Done in r1411202.
All other changes in r1411209.

> * subversion/libsvn_fs_fs/fs_fs.c
> > (svn_fs_fs__put_node_revision): also look for SHA1-named files
> > (get_shared_rep): write SHA1-named files
> [...]
> > @@ -2603,7 +2607,32 @@ svn_fs_fs__put_node_revision(svn_fs_t *f
> >
> > - return svn_io_file_close(noderev_file, pool);
> > + SVN_ERR(svn_io_file_close(noderev_file, pool));
> > +
> > + /* if rep sharing has been enabled and the noderev has a data rep
> > + * and its SHA-1 is known, store the rep struct under its SHA1. */
>
> It
> looks like we re-enter this 'if' block, and re-write the rep-reference
> file, every time we store a node-rev with the same representation --
> even if the rep we're storing was already found from this file. This
> might not be much of an overhead, but it seems ugly.
>
> Could we
> avoid re-writing it in those cases? Maybe by refactoring such that we
> write the rep-reference file immediately after writing the rep, instead
> of here in put_node_revision.
>

Yep. It turns out that rep_write_contents_close is the
only relevant user of that functionality. There, we can
also decide whether to store the sha1->rep mapping
or not. It also addresses the locking problem below.

> > + if (sha1)
> > + {
> > + apr_file_t *rep_file;
> > + const char *file_name = svn_dirent_join(path_txn_dir(fs, txn_id,\
> > + pool), sha1, pool);
>
> Creating
> the file name ('checksum_to_cstring' + 'path_txn_dir' +
> 'svn_dirent_join') is common to both this function and the function
> below where we read the file. Factor it out, like the existing
> path_txn_node_rev() and similar functions, to make the commonality
> clear.
>

Done.

> > + const char *rep_string = representation_string(noderev->data_rep,
> > + ffd->format,
> > + (noderev->kind
> > + == svn_node_dir),
> > + FALSE,
> > + pool);
> > + SVN_ERR(svn_io_file_open(&rep_file, file_name,
> > + APR_WRITE | APR_CREATE | APR_TRUNCATE
> > + | APR_BUFFERED, APR_OS_DEFAULT, pool));
> > +
> > + SVN_ERR(svn_io_file_write_full(rep_file, rep_string,
> > + strlen(rep_string), NULL, pool));
> > +
> > + SVN_ERR(svn_io_file_close(rep_file, pool));
> > + }
> > +
> > + return SVN_NO_ERROR;
> > }
> >
> > @@ -7083,6 +7112,30 @@ get_shared_rep(representation_t **old_re
> >
> > + /* look for intra-revision matches (usually data reps but not limited
> > + to them in case props happen to look like some data rep)
> > + */
> > + if (*old_rep == NULL && rep->txn_id)
> > + {
> > + svn_node_kind_t kind;
> > + const char *file_name
> > + = svn_dirent_join(path_txn_dir(fs, rep->txn_id, pool),
> > + svn_checksum_to_cstring(rep->sha1_checksum,\
> > + pool), pool);
> > +
> > + /* in our txn, is there a rep file named with the wanted SHA1?
> > + If so, read it and use that rep. */
> > + SVN_ERR(svn_io_check_path(file_name, &kind, pool));
>
> Instead
> of stat'ing the file to decide whether to open it, it's more efficient
> and more 'robust' to just try to open it and then handle the potential
> failure, isn't it?
>

Access is serialized (see below). Checking for existence
should be more efficient because actual matches are
relatively rare and constructing an error object can be
expensive (NLS). svn_stringbuf_from_file2 is also a very
convenient function to use.

> + if (kind == svn_node_file)
> > + {
> > + svn_stringbuf_t *rep_string;
> > + SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name,\
> > + pool));
>
> I'm
> sure the answer is obvious to those familiar with FSFS, but just to be
> clear please could you tell me which locking
> mechanism guarantees that the opening and reading of this file cannot be
> happening at the same time as this file is being created and written
> (by another thread or process)?
>

Writing representations is serialized using the protorev lock.
All access to the sha1->rep mapping is done from
rep_write_contents_close() now, which in turn, is protected
by that lock.

> Is it possible that when this
> file was being written, the writer crashed in such a way that the file
> is now present but incomplete, and do we want to be able to recover --
> that is, be able to re-open this txn and resume writing into it -- after
> such a failure? If so, we should catch errors here that result from
> reading incomplete data, and in such cases proceed as if the file was
> not found; or else write the file atomically so that that can't happen.
>

A crashed writer process may leave a corrupt protorev and / or
other incomplete files. There is no atomic incremental change
here. The caller (client) using the crashed process is supposed
to detect the crash and abandon the transaction.

> > + SVN_ERR(read_rep_offsets_body(old_rep, rep_string->data,
> > + rep->txn_id, FALSE, pool));
> > + }
> > + }
> > +
> > /* Add information that is missing in the cached data. */
> > if (*old_rep)
> > {
>
> - Julian
>
>
Thanks for the review! The code should be multi-thread safe.

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*
http://www.wandisco.com/subversion/download
*
Received on 2012-11-19 15:43:40 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.