[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Last-Modified HTTP header in GET responses

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 4 Sep 2019 11:44:25 +0200

On Tue, Sep 3, 2019 at 8:01 AM Branko Čibej <brane_at_apache.org> wrote:
>
> On 02.09.2019 16:20, Johan Corveleyn wrote:
> > On Fri, Jan 15, 2016 at 1:58 PM Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> >> On 7 January 2016 at 10:34, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> >>> On 6 January 2016 at 08:14, Greg Stein <gstein_at_gmail.com> wrote:
> >>>> Personally, I'd be more interested in the effects on the network and its
> >>>> caching ability. Do we really need to save CPU/IO on the server? Today's
> >>>> servers seem more than capable, and are there really svn servers out in the
> >>>> wild getting so crushed, that this is important? It seems that as long as
> >>>> proxies/etc can properly cache the results, and (thus) avoid future touches
> >>>> on the backend server, then we're good to go.
> >>>>
> >>>> If the patch doesn't affect the caching (which it sounds like "no"), then
> >>>> just go with it. Sure, it is neat to look at ayscalls, but... why? I don't
> >>>> understand the need to examine/profile. Educate me?
> >>>>
> >>> The patch should not affect HTTP caching for two reasons:
> >>> 1. Browsers and proxies supports ETag and use it instead of
> >>> Last-Modified header.
> >>> 2. ETag and Last-Modified headers are used only for cache
> >>> re-validation when max-age is expired. But Subversion sets max-age=1
> >>> week for resources with specific revision in URL
> >>> (http://server/!svn/rvr/1/path). max-age=0 is only used for public
> >>> URLs without revision, i.e. http://server/path)
> >>>
> >>> As far I know proxy usage are limited to public servers with anonymous
> >>> access, since caching of HTTP responses with Authorization is
> >>> prohibited by RFC.
> >>>
> >>> Anyway I agree that trading bandwidth usage to save some CPU/IO on the
> >>> server doesn't make sense, but Last-Modified case is the different:
> >>> Subversion server wasting 10%+ of server resources to produce unused
> >>> header.
> >>>
> >>> I don't have access to svn.apache.org server performance stats, but I
> >>> suppose it's pretty busy server and Infra team would welcome any
> >>> Subversion server performance improvements.
> >>>
> >> Committed in r1724790.
> >>
> >> --
> >> Ivan Zhakov
> > A bit late perhaps, but apparently this change (removing the
> > Last-Modified header from GET responses) broke a specific use case at
> > my company (we just upgraded our SVN server from 1.9 to 1.10, bringing
> > along this particular change):
> >
> > - We use Apache Ivy (http://ant.apache.org/ivy/) for dependency
> > management of our Java applications.
> >
> > - Third party jar files are kept in our svn repo under
> > /trunk/ivyrepository (and branched / tagged in release branches, so we
> > have completely reproducible builds, even if our third party jars or
> > their dependency structures change on trunk).
> >
> > - We use Ivy's "URL Resolver" [1], which downloads the files with
> > regular GET requests (and HEAD requests to check the up-to-dateness of
> > the cache on the client). We effectively use SVN in this case as a
> > "regular" file server (which coincidentally has branches and tags so
> > we can resolve against the correct tree when making a build).
> >
> > This last part now fails, i.e. Ivy's URLResolver no longer detects
> > that a file has changed. It used to compare its own "last-mod time of
> > the file on disk in the cache" with the Last-Modified header, which
> > works fine with all kinds of file servers, and worked with SVN < 1.10.
> >
> > I think it's unfortunate that we broke compatibility here (even if
> > it's not usage by a normal svn client) for the sake of some relatively
> > small performance / load gain on the server. If we could get the old
> > behavior back with some Apache directive, that would have been fine,
> > but there is no such option at the moment.
> >
> > Also: if the Last-Modified would have been removed only for the
> > "internal GET urls" (like http://server/!svn/rvr/1/path), for
> > optimizing checkout (as executed by normal svn clients), that would
> > have been understandable. But why remove it for the "external GET
> > urls" (http://svnserver/path) as well? Those have nothing to do with
> > checkout load, those are only used by browsers or "tools using SVN as
> > a glorified file server" :-).
> >
> > I am by no means an expert in HTTP standards, and various online
> > sources give different recommendations for these headers (ETag,
> > Last-Modified, ... request headers for conditional GETs, ...). But we
> > found an old discussion thread on the "dev_at_rapidsvn.tigris.org"
> > mailinglist from 15 years ago, discussing "a very basic idea: let
> > mod_dav_svn set the Last-Modified HTTP header ..." [2]. Perhaps the
> > feature dates from back then (indicating that it wasn't an accidental
> > feature)?
>
> I'm fairly sure that it wasn't accidental. The whole idea that you can
> use a browser to look at a Subversion repository was intentional, adding
> appropriate HTTP headers in responses was surely part of that.
>
> That said, Ivan's original argument about caching not being affected by
> this change is correct ... but it ignores your particular use case. The
> error was in assuming it's OK to break compatibility on the protocol
> level like this.
>
> > Anyway, how about bringing this feature back in some form?
> > - Revert r1724790?
>
> This is clearly the simplest solution, but I have no idea what the
> performance impact would be. From looking at the diff, my best guess is
> that svn_fs_node_created_rev() and svn_fs_revision_prop() dominate.
>
> > - or only for "external GET urls"?
> > - or only if some Apache directive is set?
> >
> > Thoughts?
>
> I would prefer not to add yet another configuration knob to the server.
> I agree that versioned-resource URLs are only interesting for DAV-aware
> clients, and those clients already know how to check for modifications
> without looking at Last-Modified. That would imply that adding the
> header for external URLs is the right solution.
>
> -- Brane

Thanks for your response, Brane.

I think the below patch would do it (set the Last-Modified header only
for "external URLs").
It's basically a revert of r1724790 (and adding a test), plus wrapping
the actual setting of Last-Modified inside the following condition:

    if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
        && (resource->info->repos_path == resource->info->uri_path->data))

I'm not entirely sure this is the perfect way to test for "only
external url's", but it seems to work (i.e. it adds the header to
/iota, but not to /!svn/rvr/1/iota). If I omit the second part of the
condition, it's not selective enough (apparently /!svn/rvr/1/iota is
also a REGULAR resource). That second part of the condition
corresponds to what is specifically set in the case of an "external
request" in parse_uri(), IIUC:
    http://svn.apache.org/viewvc/subversion/trunk/subversion/mod_dav_svn/repos.c?revision=1724790&view=markup&pathrev=1724790#l791

[[[
Index: subversion/mod_dav_svn/repos.c
===================================================================
--- subversion/mod_dav_svn/repos.c (revision 1866345)
+++ subversion/mod_dav_svn/repos.c (working copy)
@@ -3139,6 +3139,50 @@ seek_stream(dav_stream *stream, apr_off_t abs_posi
        && resource->baselined))

+/* Return the last modification time of RESOURCE, or -1 if the DAV
+ resource type is not handled, or if an error occurs. Temporary
+ allocations are made from RESOURCE->POOL. */
+static apr_time_t
+get_last_modified(const dav_resource *resource)
+{
+ apr_time_t last_modified;
+ svn_error_t *serr;
+ svn_revnum_t created_rev;
+ svn_string_t *date_time;
+
+ if (RESOURCE_LACKS_ETAG_POTENTIAL(resource))
+ return -1;
+
+ if ((serr = svn_fs_node_created_rev(&created_rev, resource->info->root.root,
+ resource->info->repos_path,
+ resource->pool)))
+ {
+ svn_error_clear(serr);
+ return -1;
+ }
+
+ if ((serr = svn_fs_revision_prop2(&date_time, resource->info->repos->fs,
+ created_rev, SVN_PROP_REVISION_DATE,
+ TRUE, resource->pool, resource->pool)))
+ {
+ svn_error_clear(serr);
+ return -1;
+ }
+
+ if (date_time == NULL || date_time->data == NULL)
+ return -1;
+
+ if ((serr = svn_time_from_cstring(&last_modified, date_time->data,
+ resource->pool)))
+ {
+ svn_error_clear(serr);
+ return -1;
+ }
+
+ return last_modified;
+}
+
+
 const char *
 dav_svn__getetag(const dav_resource *resource, apr_pool_t *pool)
 {
@@ -3219,6 +3263,25 @@ set_headers(request_rec *r, const dav_resource *re
   if (!resource->exists)
     return NULL;

+ if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
+ && (resource->info->repos_path == resource->info->uri_path->data))
+ {
+ /* Include Last-Modified header for 'external' GET requests
+ (i.e. requests to URI's not under /!svn), to support usage of an
+ SVN server as a file server, where the client needs timestamps
+ for instance to use as "last modification time" of files on disk. */
+ apr_time_t last_modified;
+
+ last_modified = get_last_modified(resource);
+ if (last_modified != -1)
+ {
+ /* Note the modification time for the requested resource, and
+ include the Last-Modified header in the response. */
+ ap_update_mtime(r, last_modified);
+ ap_set_last_modified(r);
+ }
+ }
+
   /* generate our etag and place it into the output */
   apr_table_setn(r->headers_out, "ETag",
                  dav_svn__getetag(resource, resource->pool));
Index: subversion/tests/cmdline/mod_dav_svn_tests.py
===================================================================
--- subversion/tests/cmdline/mod_dav_svn_tests.py (revision 1866345)
+++ subversion/tests/cmdline/mod_dav_svn_tests.py (working copy)
@@ -640,6 +640,29 @@ def propfind_propname(sbox):
   actual_response = r.read()
   verify_xml_response(expected_response, actual_response)

+@SkipUnless(svntest.main.is_ra_type_dav)
+def last_modified_header(sbox):
+ "verify 'Last-Modified' header on 'external' GETs"
+
+ sbox.build(create_wc=False, read_only=True)
+
+ headers = {
+ 'Authorization': 'Basic ' +
base64.b64encode(b'jconstant:rayjandom').decode(),
+ }
+
+ h = svntest.main.create_http_connection(sbox.repo_url)
+
+ # GET /repos/iota
+ # Expect to see a Last-Modified header.
+ h.request('GET', sbox.repo_url + '/iota', None, headers)
+ r = h.getresponse()
+ if r.status != httplib.OK:
+ raise svntest.Failure('Request failed: %d %s' % (r.status, r.reason))
+ svntest.verify.compare_and_display_lines(None, 'Last-Modified',
+ svntest.verify.RegexOutput('.+'),
+ r.getheader('Last-Modified'))
+ r.read()
+
 ########################################################################
 # Run the tests

@@ -652,6 +675,7 @@ test_list = [ None,
               propfind_404,
               propfind_allprop,
               propfind_propname,
+ last_modified_header,
              ]
 serial_only = True

]]]

-- 
Johan
Received on 2019-09-04 11:44:49 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.