[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: searching repository code

From: <david.x.grierson_at_jpmorgan.com>
Date: Mon, 3 Mar 2008 11:19:13 +0000

You might want to look at Krugle - http://www.krugle.com/ - who provide a
1U search appliance which can scan Subversion and (IIRC) ClearCase
repositories.

I haven't evaluated Krugle yet - I'm pretty interested - however you can
imagine what the kind of pain trying to get new hardware into my kind of
place would be like.

W.R.T. Fisheye - it has the following problems (disclaimer - we run a
pretty large subversion set up here - 900+ repositories consuming 700GB+
of data with 140+ of those SVN repositories configured in 3 fisheye
instances - the reason for 3 fisheye instances is as a workaround to the
first issue described below).

The following is presented to this list purely as an FYI - I'm not
necessarily looking for assistance with these issues since Atlassian have
essentially admitted that they are known problems with no immediate
workaround.

If anyone has any suggestions for these issues then they would be warmly
received.

Serial repository scanning
==========================
Under normal working conditions Fisheye scans repositories linearly
(repository1 -> repository2 -> repository3 -> ... -> repository1) - in the
event of scanning blocking on one repository - for example a large scale
addition to a repository - all other repositories are not updated until
the scan of that repository is completed.

During this blocking other commits may be taking place to the other
repositories (and may also be carried out on the repository being scanned)
thus there is more work invovled in bringing these repositories up to
date.

Additionally any one of these repositories may also have a large scale
update applied to it which could cause further delay on later
repositories.

Initial indexing is performed in a different thread from normal scanning -
therefore the addition of new repositories does not cause blocking.

Service cannot be restarted while any repository is receiving initial
scanning
==============================================================================
If a repository has initial scanning taking place - this takes place when
the repository is either re-indexed (e.g. has had any configuration
changes applied to the repository structure) or the repository has just
been added to the configuration.

This is a large volume update and, especially in the case of re-indexing,
can potentially take a long time to complete (days or even weeks). If the
fisheye service is restarted during the period when re-indexing is taking
place the new repository moves from being scanned by the initial parallel
scanning to the serial scanning method described above regardless of where
in the revisions the initial scanning has reached.

For example a Subversion repository is added to fisheye for the first time
- the repository has 28,000 revisions. Fisheye will catalogue these 28,000
revisions using the parallel initial thread.

If the fisheye server is restarted when only 12,000 revisions have been
catalogued in this repository then upon restart 16,000 revisions will have
to be indexed by the serial scanning thread. All other repositories will
be blocked from updating until this initial scanning has been completed.

To compound this problem, there is only a single thread to perform initial
scanning - consequently other repositories are queued to receive initial
index scanning behind the currently active one.

This includes restarting the service due to crashes.

Does not cope well with branch/tag deletions
============================================
One of the operations which regularly causing blocking of repository
scanning is the removal of branches or tags within a Subversion
repository.

Does not cope well with unusual repository structures
=====================================================
Fisheye requires a consistent structure in order to index the content
correctly. If a structure changes within a repository then Fisheye needs
to have that structural change applied to the repository - this then means
that the repository needs to be fully re-indexed (see points 1 & 2 above
concerning this).

Dg.

--
David Grierson
JPMorgan - IB Architecture - Source Code Management Consultant
GDP 228-5574 / DDI +44 141 228 5574 / Email david.x.grierson_at_jpmorgan.com
Alhambra House 6th floor, 45 Waterloo Street, Glasgow G2 6HS
 
Toby Thain <toby_at_telegraphics.com.au> 
02/03/2008 23:26
To
Shawn Talbert <stalbert_at_exploreconsulting.com>
cc
<users_at_subversion.tigris.org>
Subject
Re: searching repository code
On 2-Mar-08, at 10:13 AM, Shawn Talbert wrote:
What’s the best tool for searching (both code and comments) a subversion 
repository?
 
It’d be nice if there were something svn-aware (i.e. able to search only 
the head revision, or a range of revisions, or revisions after date X, 
etc.).
Try FishEye - play with my installation here:
https://www.telegraphics.com.au/fisheye/search/psdparse/
Main page:
https://www.telegraphics.com.au/fisheye
FishEye product:
http://www.atlassian.com/software/fisheye/
--Toby
 
I’ve considered periodically exporting the entire repo and using a generic 
search engine on it, but that seems less than ideal..
Generally, this communication is for informational purposes only
and it is not intended as an offer or solicitation for the purchase
or sale of any financial instrument or as an official confirmation
of any transaction. In the event you are receiving the offering
materials attached below related to your interest in hedge funds or
private equity, this communication may be intended as an offer or
solicitation for the purchase or sale of such fund(s).  All market
prices, data and other information are not warranted as to
completeness or accuracy and are subject to change without notice.
Any comments or statements made herein do not necessarily reflect
those of JPMorgan Chase & Co., its subsidiaries and affiliates.
This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.
Received on 2008-03-03 12:20:01 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.