[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Last-Modified HTTP header in GET responses

From: Branko Čibej <brane_at_apache.org>
Date: Tue, 3 Sep 2019 08:01:04 +0200

On 02.09.2019 16:20, Johan Corveleyn wrote:
> On Fri, Jan 15, 2016 at 1:58 PM Ivan Zhakov <ivan_at_visualsvn.com> wrote:
>> On 7 January 2016 at 10:34, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
>>> On 6 January 2016 at 08:14, Greg Stein <gstein_at_gmail.com> wrote:
>>>> Personally, I'd be more interested in the effects on the network and its
>>>> caching ability. Do we really need to save CPU/IO on the server? Today's
>>>> servers seem more than capable, and are there really svn servers out in the
>>>> wild getting so crushed, that this is important? It seems that as long as
>>>> proxies/etc can properly cache the results, and (thus) avoid future touches
>>>> on the backend server, then we're good to go.
>>>>
>>>> If the patch doesn't affect the caching (which it sounds like "no"), then
>>>> just go with it. Sure, it is neat to look at ayscalls, but... why? I don't
>>>> understand the need to examine/profile. Educate me?
>>>>
>>> The patch should not affect HTTP caching for two reasons:
>>> 1. Browsers and proxies supports ETag and use it instead of
>>> Last-Modified header.
>>> 2. ETag and Last-Modified headers are used only for cache
>>> re-validation when max-age is expired. But Subversion sets max-age=1
>>> week for resources with specific revision in URL
>>> (http://server/!svn/rvr/1/path). max-age=0 is only used for public
>>> URLs without revision, i.e. http://server/path)
>>>
>>> As far I know proxy usage are limited to public servers with anonymous
>>> access, since caching of HTTP responses with Authorization is
>>> prohibited by RFC.
>>>
>>> Anyway I agree that trading bandwidth usage to save some CPU/IO on the
>>> server doesn't make sense, but Last-Modified case is the different:
>>> Subversion server wasting 10%+ of server resources to produce unused
>>> header.
>>>
>>> I don't have access to svn.apache.org server performance stats, but I
>>> suppose it's pretty busy server and Infra team would welcome any
>>> Subversion server performance improvements.
>>>
>> Committed in r1724790.
>>
>> --
>> Ivan Zhakov
> A bit late perhaps, but apparently this change (removing the
> Last-Modified header from GET responses) broke a specific use case at
> my company (we just upgraded our SVN server from 1.9 to 1.10, bringing
> along this particular change):
>
> - We use Apache Ivy (http://ant.apache.org/ivy/) for dependency
> management of our Java applications.
>
> - Third party jar files are kept in our svn repo under
> /trunk/ivyrepository (and branched / tagged in release branches, so we
> have completely reproducible builds, even if our third party jars or
> their dependency structures change on trunk).
>
> - We use Ivy's "URL Resolver" [1], which downloads the files with
> regular GET requests (and HEAD requests to check the up-to-dateness of
> the cache on the client). We effectively use SVN in this case as a
> "regular" file server (which coincidentally has branches and tags so
> we can resolve against the correct tree when making a build).
>
> This last part now fails, i.e. Ivy's URLResolver no longer detects
> that a file has changed. It used to compare its own "last-mod time of
> the file on disk in the cache" with the Last-Modified header, which
> works fine with all kinds of file servers, and worked with SVN < 1.10.
>
> I think it's unfortunate that we broke compatibility here (even if
> it's not usage by a normal svn client) for the sake of some relatively
> small performance / load gain on the server. If we could get the old
> behavior back with some Apache directive, that would have been fine,
> but there is no such option at the moment.
>
> Also: if the Last-Modified would have been removed only for the
> "internal GET urls" (like http://server/!svn/rvr/1/path), for
> optimizing checkout (as executed by normal svn clients), that would
> have been understandable. But why remove it for the "external GET
> urls" (http://svnserver/path) as well? Those have nothing to do with
> checkout load, those are only used by browsers or "tools using SVN as
> a glorified file server" :-).
>
> I am by no means an expert in HTTP standards, and various online
> sources give different recommendations for these headers (ETag,
> Last-Modified, ... request headers for conditional GETs, ...). But we
> found an old discussion thread on the "dev_at_rapidsvn.tigris.org"
> mailinglist from 15 years ago, discussing "a very basic idea: let
> mod_dav_svn set the Last-Modified HTTP header ..." [2]. Perhaps the
> feature dates from back then (indicating that it wasn't an accidental
> feature)?

I'm fairly sure that it wasn't accidental. The whole idea that you can
use a browser to look at a Subversion repository was intentional, adding
appropriate HTTP headers in responses was surely part of that.

That said, Ivan's original argument about caching not being affected by
this change is correct ... but it ignores your particular use case. The
error was in assuming it's OK to break compatibility on the protocol
level like this.

> Anyway, how about bringing this feature back in some form?
> - Revert r1724790?

This is clearly the simplest solution, but I have no idea what the
performance impact would be. From looking at the diff, my best guess is
that svn_fs_node_created_rev() and svn_fs_revision_prop() dominate.

> - or only for "external GET urls"?
> - or only if some Apache directive is set?
>
> Thoughts?

I would prefer not to add yet another configuration knob to the server.
I agree that versioned-resource URLs are only interesting for DAV-aware
clients, and those clients already know how to check for modifications
without looking at Last-Modified. That would imply that adding the
header for external URLs is the right solution.

-- Brane
Received on 2019-09-03 08:01:08 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.