[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: >1024 connections in slapd / select->poll
Yusuf Goolamabbas wrote:
Well if you're feeling brave, I've just completed a patch in CVS HEAD
supporting epoll. I haven't tried testing it with a massive number of
connections yet, but the code now passes the regular test suite. It
should be simple enough to add kqueue support as well now (I would have
begun that but I don't have BSD installed anywhere at the moment).
Regular poll can easily be added if you want, but there's really no
reason to. Solaris /dev/poll is a bit more awkward.
Solaris 10 supports event ports which is supposedly thread friendly
http://blogs.sun.com/roller/page/barts/20040720
http://developers.sun.com/solaris/articles/event_completion.html
Thanks for the links, good references. Of course they seem to confirm
that there's no compelling reason to migrate away from select() in
slapd...poll() blocks the entire process, /dev/poll has strict mutex
requirements and performs poorly when the descriptor list changes
frequently... epoll has some of that characteristic as well - modifying
the descriptor set requires a system call, a trip across the user/kernel
barrier. With select you just flip a bit in userspace and you're done.
The Solaris event ports sound interesting, but I think anybody who
develops a "new event handler" on Unix and forgets to support signal()
at the outset has overlooked something important...
Anyway, it's too bad that everyone is just copying each other's ideas
and not actually learning from the obvious limitations of all of these
schemes. A real solution needs to not only perform well on large sets of
monitored items, but it needs to be extremely cheap to create and manage
these sets in the first place. Only select wins on that score, and the
obvious solution to avoid the argument passing overhead that everyone
seems so foolishly focused on is to use explicitly mapped memory for the
event sets. I.e., mmap a region that is directly accessible in both user
and kernel space so that no byte copying needs to be done.
Another point where select (and poll) wins is that there is a fast
mapping from the input set to the result set - i.e., if you want to know
"did event #5 occur?" you can find out in constant time, because it's
just a fixed bitfield lookup. For all the other mechanisms that either
return events one at a time or in a flat list, you have to iterate thru
the list to look for "event #5". They have thus taken the linear search
that the kernel does for select and kicked it out into userland, thus
perceiving a great savings in kernel CPU time but not really improving
life for the application developer. There is an obvious way to solve
both of these problems with no additional cost - do both.
Define the input event set as an array of structures, as most of these
mechanisms do. The array resides in a shared memory region. We can use a
modified struct kevent as a typical structure:
struct kevent {
uintpt_t ident; /* identifier for event */
short filter; /* filter for event */
u_short flags; /* action flags */
u_int inflags; /* filter flags of interest */
u_int outflags; /* resulting flags */
intptr_t data; /* filter data value */
void *udata; /* opaque identifier */
}
kqueue is pretty darn good, but it still misses on the argument copying
problem, its result set is an array of struct kevent's describing the
results, and it doesn't give you direct access for priority
management.The bulk of the struct is redundant information, all we want
to know are the resulting flags and any data accompanying it.
What a really good, efficient mechanism would do is leave the input
event array in place, set the result flags and data there, and return a
list of *offsets* to all the entries in the array that got signaled.
That way you can navigate the result list in priority order and in event
order, all without expensive linear search time. (If your argument list
is a mmap'd region, using offsets means you don't have to guarantee the
region gets mapped to any particular address, but you can still remember
the location of the relevant structure for a monitored object and access
it in constant time.)
Maybe I'll write a patch for this for Linux over the holidays. Starting
from either of poll or kqueue it would be pretty easy to fix this up right.
--
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support