Re: Total failure in dav layer when >1024 files already open

From: Michael Sweet <mike_at_easysw.com>
Date: 2006-06-05 19:37:22 CEST

Martin v. Löwis wrote:
> Michael Sweet wrote:
>> The select() method is O(n). The poll() method is O(n log n) unless
>> you manage a large array that maps from FD to the corresponding
>> state data for that FD, which can eat up a LOT of memory...
>
> You are assuming an implementation strategy here.

Not of select() or poll(), but certainly for the application.

For a single file descriptor, there is no difference for an
application - use fd_set, use pollfd, both are simple and
constant time.

However, for any non-trivial application managing multiple file
descriptors, poll() adds overhead in the form of pollfd array
management, no matter how else you implement things in your
code.

> poll() is
> O(number-of-polled-fds) on Linux. Yes, there is an array of all
> files, per process, but that does not consume a LOT of memory.
> Instead, it consumes one pointer per file number, up to some
> dynamically-determined maximum.
>
> OTOH, select is O(maximum-fd-value). Assuming you always pass
> only a few file descriptors to select/poll, poll is more
> efficient if the process has many open files.

Again, I'm not talking about the implementation complexity of
either interface, but the application code complexity to use
those interfaces. Regardless, O(maximum-fd-value) is equivalent
to O(number-of-polled-fds) in terms of algorithmic complexity
[O(n) == O(n)], and very likely in practice (clock time) as well.

In the case of CUPS, the number of file descriptors is often the
same as the number of files we are doing a select() on, so for
us poll() is a performance loser.

On the application side, with poll() you need to either loop
through the pollfd array to determine which file descriptors are
active, or loop through the (separately maintained) state array
which references those file descriptors. That is O(n). Then you
have the lookup the state data for that file descriptor; for a
sparse array (index != fd), the best you can do is O(log N),
making the total (application) code complexity O(n log n).
Similarly, if you loop through your state data [O(n)] and lookup
the file descriptors in a sorted pollfd array [O(log n)], you
still have O(n log n). The fastest implementation would be to
use an array that maps file descriptors (or pollfd array indices)
to your state data, which reduces the total complexity to O(n).

For select(), you can just loop through your state data and use the
FD_ISSET macro to test the corresponding fd_set. That is O(n).

The situation is somewhat similar for fd_set vs. pollfd array
management. Adding and/or removing a file descriptor in a dynamic
pollfd array requires O(n log n) for every addition or removal and
O(log n) for update. Doing so for fd_set is O(1) (constant time)
for all of the UNIX fd_set implementations I have worked with.

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Document Software          http://www.easysw.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Mon Jun 5 19:37:51 2006

This message: [ Message body ]
Next message: Daniel Rall: "Re: svn commit: r19923 - in trunk: . build/ac-macros"
Previous message: Garrett Rooney: "Re: [PATCH] Moving Unversioned Files"
In reply to: Martin v. Löwis: "Re: Total failure in dav layer when >1024 files already open"
Next in thread: Jonathan Gilbert: "Re: Total failure in dav layer when >1024 files already open"
Reply: Jonathan Gilbert: "Re: Total failure in dav layer when >1024 files already open"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]