Notes on I/O Multiplexing and the select() system call


I/O Multiplexing is calling a system call to monitor a list of file descriptors, and return once some of the descriptors are ready for I/O. On return from the function, we need a means to identify which descriptors are ready for I/O.

The select() system call implements I/O Multiplexing. It's declaration looks:


    #include <sys/types.h>   /* fd_set data type */
    #include <sys/time.h>    /* struct timeval */
    #include <unistd.h>      /* function prototype might be here */

    int select(int maxfd1, fd_set *readSet, fd_set *writeSet, fd_set *exceptionSet, struct timeval *timePtr);

We'll now briefly review each component of this declaration

First argument: int maxfd1

This argument serves to limit the number of descriptors that the function checks. This is done primarily for efficiency reasons. It's value should be one higher than the maximum file descriptor. 2nd - 4th arguments: fd_set *readSet, fd_set *writeSet, fd_set *exceptionSet


Background on FD_SETS

  • FD_SETS are "bit vectors" indexed by file descriptors (which are integers). The first bit in an FD_SET corresponds to file descriptor 0, the second bit corresponds to file descriptor 1, ...etc. [see diagram on p.398]

  • You can manipulate the bits in a FD_SET variable using the following macros:

    • FD_ZERO  (fd_set *fdset);
      /* clear all bits in fdset */

    • FD_SET   (int fd, fd_set *fdset);
      /* turn on bit for fd in fdset */

    • FD_CLR   (int fd, fd_set *fdset);
      /* turn off bit for fd in fdset */

    • FD_ISSET (int fd, fd_set *fdset);
      /* test bit for fd in fdset */


The readSet, the 2nd parameter to select, is used for detecting when a file descriptor is "ready for reading". In other words, it detects when we can read from the file descriptor without blocking (ie. there is data to read).

The writeSet is used for detecting when a file descriptor is "ready for writing". In other words, it detects when we can write to the file descriptor without blocking (ie. the write buffer is not full).

The exceptionSet detects when an exception occurs on a file descriptor. An example of an exception condition is the arrival of out-of-band data on a network connection. In general, you will not need to monitor the exceptionSet for CMPUT 379, so our focus will be limited to the readSet and the writeSet.

Any of the fd_set parameters can be set to NULL. For example, if we are not interested in exceptions, we can replace the 4th parameter to select with NULL.

Each fd_set component contains an entry for each open file descriptor. Each file descriptor is either set or not set. It is quite possible, and perfectly normal, for multiple file descriptors in a given fd_set to be set at the same time.

Before the call to select(), if a FD_SET bit is set to 1 then we are interested in the associated file descriptor.

For example, if we were interested to know when stdin became ready for reading, we could go:

    ...
    fd_set readSet;
    int    numReady;
    ...
    FD_ZERO(&readSet);
    FD_SET(STDIN_FILENO, &readSet); /* set bit for stdin to 1 */
    numReady = select(STDIN_FILENO+1, &readSet, NULL, NULL, NULL);
    ...

Upon return from the call to select(), if a bit in fdset is set to 1 then that file descriptor is ready to be processed. In contrast, if a given bit has a value of 0 then that file descriptor is not ready to be processed

How a file descriptor is processed depends on what kind of descriptor it is. Let's assume that a file descriptor, after returning from select, is marked as ready for reading...

In the above, I define a Master Socket as a socket that is always listening for new client connections -- it corresponds to the return value for a call to socket() in a server program.

5th argument: struct timeval *timePtr

The definition of this structure looks:


    struct timeval {
         long tv_sec;   /* seconds      */
         long tv_usec;  /* microseconds */
    }

There are 3 ways to fill in this final parameter:

  1. NULL : wait forever
  2. timePtr->tv_sec == 0 AND timePtr->tv_usec == 0 : non-blocking
  3. timePtr->tv_sec != 0 OR timePtr->tv_usec != 0 : block until time limit expires

Return value:

The return value is an integer, which can be interpreted as follows:

Misc Points:

References

  1. "Advanced Programming in the UNIX Environment", W. Richard Stevens, Section 12.5.1