[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Queries about rep cache: get_shared_rep()

From: Julian Foad <julianfoad_at_gmail.com>
Date: Wed, 27 May 2015 09:58:22 +0100

Hi Bert.

Bert Huijben wrote:
> On Windows opening the file is sensitive to outside interactions and may
> trigger a retry loop. [...] A filestat is +- a constant time operation that
> doesn't have these problems.

OK...

> So this really depends on how common all cases are. If not existing or
> possibly locked is common, then statting first could certainly be useful...
>
> But if it is +- an error if the file doesn't exist and the file must be
> opened in almost every case then the open and handle errors keeps the code
> clean.

In this case we're looking to see if the rep is a duplicate of one
that has already been inserted in the current transaction. In the
majority of cases in typical work flows, the requested file will not
exist.

Therefore I think it is safe to say that the current code (statting
first) will be reasonably efficient overall.

However, I don't think that it would necessarily be inefficient to
just try opening the file. It's important to distinguish *requesting*
to open a file at a path where a file may or may not be present and
accessible, from actually opening the file. At what point do these
"outside interactions" typically occur? I spent a few minutes Googling
("file system filter driver" and "IRP_MJ_CREATE" are relevant terms)
but couldn't find an answer. I imagine if the file does not exist then
there would *not* be significant delays and retries due to waiting for
a virus scanner (etc.) if we simply tried to open it.

In conclusion, it seems to me now that the code is fine as it is, and
the question of whether just trying to open a non-existent file would
be generally efficient or inefficient is of only academic interest at
this point.

- Julian

> From: Julian Foad
> Index: subversion/libsvn_fs_fs/transaction.c
> ===================================================================
> --- subversion/libsvn_fs_fs/transaction.c (revision 1681856)
> +++ subversion/libsvn_fs_fs/transaction.c (working copy)
> @@ -2243,12 +2243,14 @@ (representation_t **old_re
> const char *file_name
> = path_txn_sha1(fs, &rep->txn_id, rep->sha1_digest, scratch_pool);
>
> /* in our txn, is there a rep file named with the wanted SHA1?
> If so, read it and use that rep.
> */
> + /* ### Would it be faster (and so better) to just try reading it,
> + and handle ENOENT, instead of first checking for presence? */
> SVN_ERR(svn_io_check_path(file_name, &kind, scratch_pool));
> if (kind == svn_node_file)
> {
> svn_stringbuf_t *rep_string;
> SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name,
> scratch_pool));
> @@ -2262,14 +2264,20 @@ get_shared_rep(representation_t **old_re
>
> /* A simple guard against general rep-cache induced corruption. */
> if ((*old_rep)->expanded_size != rep->expanded_size)
> {
> /* Make the problem show up in the server log.
>
> - Because not sharing reps is always a save option,
> + Because not sharing reps is always a safe option,
> terminating the request would be inappropriate.
> +
> + ### [JAF] Why should we assume the cache is corrupt rather than the
> + new rep is corrupt? Is this really the logic we want?
> +
> + Should we do something more forceful -- flag the cache as
> + unusable, perhaps -- rather than just logging a warning?
> */
Received on 2015-05-27 11:00:17 CEST

This is an archived mail posted to the Subversion Dev mailing list.