Re: why is svn add slow?

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2003-02-23 18:44:10 CET

Robert Pluim <rpluim@bigfoot.com> writes:

> Because it rereads _and_ rewrites the entries file for every file
> added, and locks and unlocks the directory it's working on for every
> file, even when all the scheduled adds are in the same directory.
> I've done a few tests, and changing to locking, reading and writing
> once per dir speeds up svn add by about 30x.

It's low priority (for me personally) because adding files is not a
bottleneck when using version control, even when if the process is
slow. It just doesn't happen that often. There have been requests in
the past to change the 'svn add' behavior so that it doesn't stop on
an already versioned item but continues to consider any children
(assuming --non-recursive was not given and repecting the ignored
names). As far as I recall there were no objections to this, just
that nobody has so far implemented it. Would that solve your
particular use case?

> The question I have is, what is the best way to improve the current
> code:
>
> 1) Change svn_client_add to remember the directory it was last working
> on, making sure it's cached the entries file from the previous time
> around? (where would it cache it? The pool it's passed might not
> exist next time round).

I don't really like this approach, it makes the client interface more
difficult to use if application has to get involved with the access
batons.

> 2) Add an svn_client_add_in_dir, where you pass an apr_array where all
> the targets are guaranteed to be in the same dir, and make
> svn_cl__add call that?

I don't like the idea of a second add interface either.

> 3) Some other way to make the entries file be cached? I haven't fully
> understood how the set field of svn_wc_adm_access_t is used yet.

There is some documentation in notes/entries-caching, it went there
when entries caching was planned but not implemented, possibly some of
it should move into lock.c.

> I'm not sure either way. (1) feels icky, since it requires caching
> unbeknownst to the client code. (2) feels cleaner, but requires
> svn_cl__add to sort through all the scheduled adds, splitting them
> based on parent directory.
>
> What do you think?

I would prefer a single add interface such as

svn_error_t *
svn_client_add (const apr_array_header_t *paths,
                svn_boolean_t recursive,
                svn_client_ctx_t *ctx,
                apr_pool_t *pool);

and have the client library reuse the access batons.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Sun Feb 23 18:44:54 2003

This message: [ Message body ]
Next message: Justin Erenkrantz: "Re: Hook script from 'dav mirror' branch?"
Previous message: Robert Pluim: "why is svn add slow?"
In reply to: Robert Pluim: "why is svn add slow?"
Next in thread: Robert Pluim: "Re: why is svn add slow?"
Reply: Robert Pluim: "Re: why is svn add slow?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]