Re: svn_io_open_unique_file: quadratic cost

From: <kfogel_at_collab.net>
Date: 2003-10-16 19:55:54 CEST

Julian Foad <julianfoad@btopenworld.com> writes:
> apr_psprintf
> svn_path_cstring_from_utf8
> apr_file_open
>
> Yuck!
>
> Presumably we don't open many temp files of the same pattern at once
> during "normal" use, but that's no excuse.

Well, actually that *is* the excuse :-).

(Do we have profiling information which indicates that
svn_io_open_unique_file() is responsible for a disproportionate amount
of running time in some operations? I don't know of any operation
which creates many tmp files based on the same name in the same
directory,... But I could be forgetting some subtle case.)

> The comment of "historic inevitability" on it says why it replaces
> "tmpnam" and "tempname" (probably meaning "tempnam"?), but APR is
> the place to do this and APR has "apr_file_mktemp".
>
> If APR's "apr_file_mktemp" is not good enough, then we should
> improve it or add a more suitable one to APR. The considerations
> are:
>
> * controllability of the dir in which to create the file (APR's is
> undefined.)
> * efficiency (Ha! APR's version can't be worse than this.)
> * safety/correctness in threaded environment etc. (APR's uses
> mkstemp() and errno if mkstemp() is available.)

Controllability of the dir, and control over the filename (including
some predictability of the iterating component) were the two
considerations that prompted us to create this function. The control
over the whole filename was important for human readability, and the
predictable iterators helped debuggability during development.

> APR has its own implementation to use if "mkstemp()" is not
> available. That would be a good base for the algorithm[1], and
> works on an explicit path, so I think it just needs to be exposed
> through the API. Hmm... the implementation of apr_file_mktemp only
> seems to exist in the "unix" sub-tree - what about other platforms?

APR would be a better place for this even if the implementation didn't
change at all. However, we can't afford to wait for yet another APR
release. We've got a bunch of 0.32 issues that are already blocked on
the next Apache release -- if we move something new into APR now, then
we have to go through that whole cycle again after this next Apache
comes out. That would take too much time, considering how close to
1.0 we are.

So, whatever we do, let's keep it in Subversion for now. We can move
it to APR after SVN 1.0 if we want.

As to "whatever we do": An easy fix for the quadratic behavior would
be to change the iterating portion to a semi-random number. E.g.
instead of

   tests/t1/A/D/G/pi.tmp
   tests/t1/A/D/G/pi.2.tmp
   tests/t1/A/D/G/pi.3.tmp
   tests/t1/A/D/G/pi.4.tmp

it would be something like

   tests/t1/A/D/G/pi.tmp
   tests/t1/A/D/G/pi.AxQ83.tmp
   tests/t1/A/D/G/pi.4yy21.tmp
   tests/t1/A/D/G/pi.dt9w0.tmp

and so on. Thus, in a directory that already contained many files
produced by svn_io_open_unique_file(), opening *another* tmp file
would still be constant time.

If you want to make this change, +1. I'm working on issue #1490
(libsvn_wc performance) right now, if it looks like some of the
problems result from the current svn_io_open_unique_file, I'll be sure
to make a change, holler, or do something equally productive. But
right now, I'd be surprised if its implementation were the culprit in
anything major.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Oct 16 20:33:38 2003

This message: [ Message body ]
Next message: kfogel_at_collab.net: "Re: 0.32 ETA guess?"
Previous message: Russell Yanofsky: "Re: [PATCH] Compiling Subversion Libraries as DLLs on Windows"
In reply to: Julian Foad: "svn_io_open_unique_file: quadratic cost"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]