[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 1.7.x - merge now accesses all files in WC?

From: Daniel Becroft <djcbecroft_at_gmail.com>
Date: Thu, 9 Dec 2010 07:27:24 +1000

On Tue, Dec 7, 2010 at 12:39 PM, Hyrum K. Wright <
hyrum_wright_at_mail.utexas.edu> wrote:

> On Mon, Dec 6, 2010 at 4:28 PM, Hyrum K. Wright
> <hyrum_wright_at_mail.utexas.edu> wrote:
> > On Mon, Dec 6, 2010 at 3:15 PM, Daniel Becroft <djcbecroft_at_gmail.com>
> wrote:
> >> On Mon, Dec 6, 2010 at 8:50 PM, Daniel Becroft <djcbecroft_at_gmail.com>
> wrote:
> >>
> >>> On Mon, Dec 6, 2010 at 7:13 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name
> >wrote:
> >>>
> >>>> Instead of guessing which function causes the lstat() calls, could we
> >>>> have a tool tell?
> >>>>
> >>>> I've looked at 'ltrace -S', but it seems to me that will only tell us
> >>>> which public svn_wc_* API is responsible for the calls. Perhaps there
> >>>> is another tool that can help us follow the entire call chain down to
> >>>> the lstat() call?
> >>>>
> >>>> Daniel
> >>>> . o O (gdb with a breakpoint in lstat() and some fu to extract
> >>>> statistics about the stack trace?)
> >>>>
> >>>
> >>> Very true.
> >>>
> >>> Okay, after a crash course in using GDB, I've run some further testing.
> >>> Running with breakpoints set on the lstat() system call, it looks to be
> the
> >>> known issue that was originally pointed to. A partial stack trace is
> below:
> >>>
> >>> #0 0x00007ffff644ab45 in __lxstat (vers=<value optimised out>,
> >>> name=<value optimised out>, buf=0x7fffffffcbf0)
> >>> at ../sysdeps/unix/sysv/linux/wordsize-64/lxstat.c:38
> >>> #1 0x00007ffff69269b6 in lstat (finfo=0x7fffffffccf0,
> >>> fname=0x6bd638 "/home/djcbecroft/dev/workingcopy/A/beta.txt",
> >>> wanted=33137, pool=0x6bd588) at /usr/include/sys/stat.h:464
> >>> #2 apr_stat (finfo=0x7fffffffccf0,
> >>> fname=0x6bd638 "/home/djcbecroft/dev/workingcopy/A/beta.txt",
> >>> wanted=33137, pool=0x6bd588) at file_io/unix/filestat.c:292
> >>> #3 0x00007ffff6fb1199 in io_check_path (
> >>> path=0x6bd608 "/home/djcbecroft/dev/workingcopy/A/beta.txt",
> >>> resolve_symlinks=0, is_special_p=0x7fffffffced8,
> kind=0x7fffffffcedc,
> >>> pool=0x6bd588) at subversion/libsvn_subr/io.c:222
> >>> #4 0x00007ffff6fb1346 in svn_io_check_special_path (
> >>> path=0x6bd608 "/home/djcbecroft/dev/workingcopy/A/beta.txt",
> >>> kind=0x7fffffffcedc, is_special=0x7fffffffced8, pool=0x6bd588)
> >>> at subversion/libsvn_subr/io.c:283
> >>> #5 0x00007ffff794513d in svn_wc__db_pdh_parse_local_abspath (
> >>> pdh=0x7fffffffd038, local_relpath=0x7fffffffd030, db=0x64ad38,
> >>> local_abspath=0x6bd608
> "/home/djcbecroft/dev/workingcopy/A/beta.txt",
> >>> smode=svn_sqlite__mode_readwrite, result_pool=0x6bd588,
> >>> scratch_pool=0x6bd588) at subversion/libsvn_wc/wc_db_pdh.c:382
> >>> #6 0x00007ffff7935d58 in svn_wc__db_read_info (status=0x7fffffffd184,
> >>> kind=0x7fffffffd188, revision=0x0, repos_relpath=0x0,
> >>> repos_root_url=0x0,
> >>> repos_uuid=0x0, changed_rev=0x0, changed_date=0x0,
> changed_author=0x0,
> >>> last_mod_time=0x0, depth=0x0, checksum=0x0, translated_size=0x0,
> >>> target=0x0, changelist=0x0, original_repos_relpath=0x0,
> >>> original_root_url=0x0, original_uuid=0x0, original_revision=0x0,
> >>> props_mod=0x0, have_base=0x0, have_work=0x0, conflicted=0x0,
> lock=0x0,
> >>> db=0x64ad38,
> >>> local_abspath=0x6bd608
> "/home/djcbecroft/dev/workingcopy/A/beta.txt",
> >>> result_pool=0x6bd588, scratch_pool=0x6bd588)
> >>> at subversion/libsvn_wc/wc_db.c:5261
> >>>
> >>> However, I then put the breakpont in the svn_wc__db_read_info function
> and,
> >>> just for A/beta.txt, it gets called 7 times, when in terms leads to 7
> >>> additional lstat() calls. Looking further the call stack appears as
> follows:
> >>>
> >>> #0 svn_wc__db_read_info (status=0x7fffffffd184, kind=0x7fffffffd188,
> >>> revision=0x0, repos_relpath=0x0, repos_root_url=0x0,
> repos_uuid=0x0,
> >>> changed_rev=0x0, changed_date=0x0, changed_author=0x0,
> >>> last_mod_time=0x0,
> >>> depth=0x0, checksum=0x0, translated_size=0x0, target=0x0,
> >>> changelist=0x0,
> >>> original_repos_relpath=0x0, original_root_url=0x0,
> original_uuid=0x0,
> >>> original_revision=0x0, props_mod=0x0, have_base=0x0, have_work=0x0,
> >>> conflicted=0x0, lock=0x0, db=0x64ad38,
> >>> local_abspath=0x6bd608
> "/home/djcbecroft/dev/workingcopy/A/beta.txt",
> >>> result_pool=0x6bd588, scratch_pool=0x6bd588)
> >>> at subversion/libsvn_wc/wc_db.c:5259
> >>> #1 0x00007ffff78fab96 in walker_helper (db=0x64ad38,
> >>> dir_abspath=0x6ba288 "/home/djcbecroft/dev/workingcopy/A",
> >>> show_hidden=1,
> >>> walk_callback=0x7ffff7ba9026 <get_mergeinfo_walk_cb>,
> >>> walk_baton=0x7fffffffd4e0, depth=svn_depth_infinity,
> >>> cancel_func=0x4119c3 <svn_cl__check_cancel>, cancel_baton=0x0,
> >>> scratch_pool=0x6ba208) at subversion/libsvn_wc/node.c:679
> >>> #2 0x00007ffff78facb5 in walker_helper (db=0x64ad38,
> >>> dir_abspath=0x67d258 "/home/djcbecroft/dev/workingcopy",
> show_hidden=1,
> >>>
> >>> walk_callback=0x7ffff7ba9026 <get_mergeinfo_walk_cb>,
> >>> walk_baton=0x7fffffffd4e0, depth=svn_depth_infinity,
> >>> cancel_func=0x4119c3 <svn_cl__check_cancel>, cancel_baton=0x0,
> >>> scratch_pool=0x6a0dc8) at subversion/libsvn_wc/node.c:716
> >>> #3 0x00007ffff78faf9a in svn_wc__internal_walk_children (db=0x64ad38,
> >>> local_abspath=0x67d258 "/home/djcbecroft/dev/workingcopy",
> >>> show_hidden=1,
> >>> walk_callback=0x7ffff7ba9026 <get_mergeinfo_walk_cb>,
> >>> walk_baton=0x7fffffffd4e0, walk_depth=svn_depth_infinity,
> >>> cancel_func=0x4119c3 <svn_cl__check_cancel>, cancel_baton=0x0,
> >>> scratch_pool=0x6a0dc8) at subversion/libsvn_wc/node.c:763
> >>> #4 0x00007ffff78fb06d in svn_wc__node_walk_children (wc_ctx=0x64ad20,
> >>> local_abspath=0x67d258 "/home/djcbecroft/dev/workingcopy",
> >>> show_hidden=1,
> >>> walk_callback=0x7ffff7ba9026 <get_mergeinfo_walk_cb>,
> >>> walk_baton=0x7fffffffd4e0, walk_depth=svn_depth_infinity,
> >>> cancel_func=0x4119c3 <svn_cl__check_cancel>, cancel_baton=0x0,
> >>> scratch_pool=0x6a0dc8) at subversion/libsvn_wc/node.c:785
> >>>
> >>> Using additional breakpoints, I have verified that
> >>> svn_wc__node_walk_children, svn_wc__internal_walk_children and
> walker_helper
> >>> only get called once. It seems to be an issue within the walker_helper
> >>> itself.
> >>>
> >>> I'll continue looking at it further - let me know if I'm barking up the
> >>> wrong tree.
> >>>
> >>> Cheers,
> >>> Daniel B.
> >>> ... who now has a much greater appreciation for those that debug in GDB
> on
> >>> a regular basis.
> >>>
> >>
> >> I think I've found the culprit. As we are walking the children
> >> (svn_wc__internal_walk_children, node.c), we call the walk_callback for
> each
> >> node. At the moment, this is the get_mergeinfo_walk_cb function in
> merge.c.
> >> Inside this, we call the following:
> >>
> >> svn_wc__node_is_status_present
> >> svn_wc__node_is_status_deleted
> >> svn_wc__node_is_status_absent
> >> svn_wc__path_switched
> >> svn_wc_read_kind
> >> svn_wc__node_get_depth
> >>
> >> Each of these will run a separate svn_wc__db_read_info call, and
> therefore a
> >> separate lstat(). There's also a couple of other calls that might call
> it
> >> indirectly.
> >
> > So it looks an improvement would be something like
> > svn_wc__get_info_for_merge_callback() which does it all in one go (or
> > some other less-verbosely-named function).
>
> Picked the low-hanging fruit in r1042892. More work could be done,
> but I think the problems lie deeper, namely:
> * read_info() was originally envisioned as being cheap to call.
> Apparently that isn't the case anymore, part of which is no doubt
> caused by...
> * the call to svn_io_check_special_path() in
> vn_wc__db_pdh_parse_local_abspath(). There is a comment in that file
> that:
>
> /* ### at some point in the future, we may need to find a way to get
> ### rid of this stat() call. it is going to happen for EVERY call
> ### into wc_db which references a file. calls for directories could
> ### get an early-exit in the hash lookup just above. */
>
> It's really unfortunate that every wc_db call on a file which fetches
> a pdh has to call stat(). I tried to implement an optimization that
> looks at the parent (which is much more likely to succeed, now that we
> have single-db), but it failed for externals (which have their own
> wc.db databases). We could work around that, but to do so, we'd need
> to know if we are looking at a file or a directory...and the way to do
> that is by calling stat().
>
> Improvements, patches, suggestions encouraged.
>

Is it possible to look at it a different way? Maybe the issue isn't that we
need to call stat() for each parse_local_abspath(), but that we are calling
that function so often? When processing recursively, do we need to
continually find the wc.db file? It seems to be residual from the 1.6.x
algorith, where the contents of a folder weren't known until svn actually
descended into that folder (and parsed the entries file). For example:

 - Enter A. Read entries. Process A\mu
 - Enter A\B. Read entries. Process lamba
 - Enter A\B\E. Read entries. Process beta.
 - Rinse and repeat.

The same process has been maintained for wc-ng, so it becomes (extremely
high level):

 - Enter A. Find A\.svn\wc.db. Process files.
 - Enter A\B. Find A\B\.svn\wc.db. If nothing, find A\.svn\wc.db. Process
files.
 - Enter A\B\E. Find A\B\E\.svn\wc.db. If nothing, find A\B\.svn\wc.db. If
nothing, find A\.svn\wc.db. Process files.
This happens for every call to parse_local_abspath(), and if there are 10 or
15 levels deep, it adds up.

Using a single metadata store, it could be switched to something like:

 - Enter A. Find metadata store, and find entries to process under A
(recursively if applicable).
 - Iterate over entries and process.

For directories, it should be known if we are processing an external or not
(since this would be in the source wc.db).

Cheers,
Daniel B.
Received on 2010-12-08 22:28:20 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.