[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Node origins cache rewrite

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Mon, 28 Jan 2008 13:21:23 -0500

C. Michael Pilato wrote:
> Mark Phippard wrote:
>> On Jan 28, 2008 12:55 PM, David Glasser <glasser_at_davidglasser.net> wrote:
>>> On Jan 25, 2008 12:50 PM, David Glasser <glasser_at_davidglasser.net>
>>> wrote:
>>>> On Jan 25, 2008 11:16 AM, David Glasser <glasser_at_davidglasser.net>
>>>> wrote:
>>>>> On Jan 24, 2008 6:54 PM, Mark Phippard <markphip_at_gmail.com> wrote:
>>>>>> I see David has rewritten this to no longer use SQLite. Yay!
>>>>> Here's an alternative implementation. In FSFS, at commit time, new
>>>>> node IDs are rewritten from a temporary value like "_ab3" to a unique
>>>>> value by adding "ab3" to the "start_node_id" field in the current
>>>>> file. This makes them not only unique, but also part of an ordered
>>>>> sequence without gaps.
>>>>>
>>>>> Is it actually important that node IDs be ordered and gapless? We
>>>>> could just change new node-IDs (in format 3 repositories) to be built
>>>>> as "<rev>-ab3". get-node-origin-rev would be trivial on these nodes.
>>>>> Pre-format-3 repositories, or nodes in format 3 repositories that
>>>>> aren't dumped and loaded, would require the slow crawl.
>>>> Like this. Can somebody review?
>>> New version, supporting "svnadmin recover". Barring objections, will
>>> commit later today.
>>
>> I do not have objections, but I did ask some questions in this message
>> that have not been answered:
>>
>> http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=134583
>
> Requiring a dump and load just to get node-origins is a non-starter, in
> my opinion. And it is repositories large enough to making dumping and
> loading such a pain that will suffer the most from *not* have the
> node-origins table.
>
> The get-location-segments fallback logic is based on 'svn log', which
> requires a loooooong time to run against, say, APR's trunk (after four
> minutes, get-location-segments.py against that URL still hadn't
> completed). Worse still, because of this new way David wishes to
> implement the feature, all that cost would be paid *every time a merge
> was requested*, rather than simply the first time as it is in the
> current implementation.

Oops. Besides the slew of typing errors made in this post, I also mis-thunk
this bit. We wouldn't need to hit the fallback logic because that is keyed
on the server being pre-1.5, which is not the case we're discussing here.
However, I still expect costs to be quite high as the server crawls all the
revisions in APR's trunk, unable to make use of such shortcuts as that which
the svn_fs_closest_copy() API provide.

-- 
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2008-01-28 19:21:36 CET

This is an archived mail posted to the Subversion Dev mailing list.