If you were unable to complete Assignment #2 and would like a solution to build upon for this assignment, you may purchase an assignment for a penalty of -20% (i.e., 20/100) off your mark for this assignment. You cannot arbitrarily adopt another student's (or anyone else's) code; that is considered plagiarism. This purchased assignment is not guaranteed to be bug free, but we will provide a solution of reasonable quality. Please contact your instructor if you would like to do this. Of course, read this assignment description first.
Cleaning Up Your Processes When using fork() (and related functions) for the first time, it is easy have bugs that leave processes on the system, even when you logout of the workstation. It is your responsibility to clean up (i.e., kill) extraneous processes from your workstation before you logout. Learn how to use the ps and kill (and related) commands. Marks will be deducted if you leave processes on a workstation after you logout. |
Standard Comment About Design Decisions Although many details about this assignment are given in this description, there are many other design decisions that are left for you to make. In those cases, you should make reasonable design decisions (e.g., that do not contradict what we have said and do not significantly change the purpose of the assignment), document them in your source code, and discuss them in your report. Of course, you may ask questions about this assignment (for example, in the newsgroup) and we may choose to provide more information or provide some clarification. However, the basic requirements of this assignment will not change. |
All three assignments in CMPUT 379 this term have been related to each other. By the end of this assignment, you will have built a sophisticated system that uses concurrency, signals, and various forms of IPC (including across workstations) to accomplish a non-trivial task.
In this assignment, you will be extending and improving the dirnanny program from Assignment #2. So far, the two versions of dirnanny that you have developed have been limited to monitoring directories on a single Unix workstation. There are many environments, such as our instructional laboratories or a cluster of workstations in a research laboratory, in which we would like to monitor directories on a number of different workstations. Of course, we could start up a separate dirnanny on each workstation, but that would be more difficult for the systems administrators to manage. For example, if the contents of the configuration file changes, that would require a separate SIGHUP to be sent to each dirnanny. Also, in order for the sysadmins to monitor the activity of dirnanny on the different workstations, multiple output files may have to be inspected. For more than a handful of workstations, this administration overhead is too high.
We will address this problem with a (simplified) system that uses a client-server software architecture to centralize the configuration over a number of workstations, to centralize the output of the activities of all the processes, and to centralize and coordinate their clean exit.
In this assignment, you must write two different programs:
Only the server dirnanny reads the configuration file and the file itself has a similar format as in Assignment #2 with some notable changes described below. Client dirnanny processes are given the configuration information by communicating with the server process using an Internet socket. The client dirnanny is responsible for creating child processes on the workstation on which it monitors directories.
Only the server dirnanny ever prints anything to stdout. Client processes send messages to the server describing the activity on the system (e.g., which files of a monitored directory have been deleted) and it is the server that actually outputs to stdout and log file. Similarly, the child processes of the clients never print anything to stdout. They send messages to their parent (the clients) about the activity on the system and the parent passes the information along to the server. Use of the stderr by the client and/or children is allowed only in extreme cases where exceptional situations prohibit the program from continuing its normal execution.
% dirnanny.server /tmp/dirnn.config
When the dirnanny.server starts up, it must output to stdout its PID, the name of the workstation it is running on, and the port number used by the server to accept new socket connections (see the discussion of Internet sockets below) in the format shown in the following example:
dirnanny.server: PID 345 on host uj01.cs.ualberta.ca port 2309
Note that the port number is the very last item printed in the line.
The contents of the configuration file might be:
uj02.cs.ualberta.ca:/tmp/bloated
00:00:50 mostrecent 00:05:00
uj01.cs.ualberta.ca:/tmp/onebigcache 00:01:00 lessthan 2M uj02.cs.ualberta.ca:/tmp/othercache 00:00:20 mostrecent 03:00:00 |
Note that the syntax has changed with respect to (a) the total run time, and (b) the directory name. The total run field that we found in previous assignments, that is, the first line in the configuration file, has been removed. The server process, once started, never terminates unless it is sent a SIGINT. In addition, the directory to monitor is now given as a combination of the hostname and the directory. That is, a directory path is not enough information to characterize a directory in a unique fashion, we need the hostname too. Note also that a host may have multiple monitored directories.
The above configuration file is just an example. Your program must work with all possible configuration files that follow the specified format. In fact, as part of your testing, you will want to experiment with different configuration files. If your server program detects an error in the configuration file, it must print an error message to stderr and then exit.
After the server process has been started, client processes can be started on a number of workstations. For this assignment, you can safely assume that there will never be more than 32 clients. The client program takes exactly two command-line arguments, which tell the client where the server is located. For example, if the server is started as shown above, then each client would be started from the command line using:
% dirnanny.client uj01.cs.ualberta.ca 2309
When the client process starts up, it must make a socket connection with the server (using the information provided on the command line). This socket will remain open for the entire lifetime of the client process. Using the socket connection, the server will tell each client which directories to monitor and their frequency and policy attributes, as indicated in the configuration file. Note that the clients never read the configuration file directly.
Each client, with the help of child processes, monitors the relevant
directories on a given workstation. When files in a directory are removed,
appropriate information is sent from the child to the client, and from
client to the server process. The server process then outputs an appropriate
message to the log file. Note that the server process is the only one with
access to the logfile. As in the previous assignments, the location of
the global logfile is determined by an environment variable DIRNANNYLOGS seen
by dirnanny.server. If the environment variable is not set, then
the default location $HOME/.dirnanny/ is used. The logfile from
a new run of
dirnanny is free to clobber the previously existing
logfile. The filename of the logfile is dnnylog.global. The messages
stored in the logfile that originated at the clients are extended to include
hostname information, so they are of the form:
[Tue Mar 8 21:19:48 MST 2005] (uj01.cs.ualberta.ca) Action: File "/tmp/onebigcache/dummy.txt" removed. |
Messages to indicate the start of communication with a client and monitoring of directories on each client (similar to the messages in the previous assignments but with indication of the hostname on which they apply) are also to be written in the logfile. Again, note that only the server process generates output. The child process that removed a file informs its parent (which is the client process) which, in turn, informs the server process. If the directory to monitor does not currently exist, the child, as in the previous assignments, will continue the monitoring, and it will pass a message, that will be recorded in the logfile, that the corresponding directory is currently not present.
At any time, the user is allowed to change the configuration file to
add, remove, or change any line in the file. The user is allowed to send
a hangup signal (SIGHUP) to the server process which forces dirnanny.server
to re-read the configuration file (which has the same filename). Then,
the server must print to the stdout (and to the logfile):
[Tue Mar 8 21:29:48 MST 2005] Info: Caught SIGHUP. Configuration file '/tmp/dirnny.config' re-read. |
Since only the server reads the configuration file, then only the server needs to be sent the SIGHUP. Then, the server must inform all of the client processes of the new configuration information. As with the previous assignment, the various child processes (and, now, client processes) must be reconfigured without any exiting/re-starting unless the number of directories (or hosts) being monitored is reduced.
Also, the user can
send an interrupt signal (SIGINT) to the server dirnanny.server
process
which forces dirnanny.server to close any open files that it might
have, ask each of the client processes to kill their children and exit,
and free up all resources (e.g., memory). In fact, SIGINT to the
server is the only means of terminating the server process (and subsequently
the clients and their children) through user intervention. The client processes
must free up any of their resources in order to make a clean exit. In essence,
an interrupt signal to the server is used to cleanly exit from the server,
the clients, and all of the child processes. Once all of the cleaning up
has been completed, the server dirnanny.server process prints
the following to stdout
(and to the logfile),
including a count of the number of files removed, and exits:
[Tue Mar 8 21:49:48 MST 2005] Info: Caught SIGINT. 33 files removed. Clients exited cleanly. Exiting cleanly. |
Again, note that aside from the output specified above, there should be no other output from your program (except for what goes to the logfile that is). You may, of course, insert printf() output for your own debugging purposes, but those lines of code should be removed. Marks will be deducted for verbose output in the version submitted for marking.
As with Assignment #2, when the program dirnanny.server is started from the command line, it must read the configuration file. Then, your system checks all the directories specified in the configuration file. If the corresponding directory does not exist, it is not considered an error. Instead, dirnanny.server reports the situation to stdout.
As in Assignment #2, the parent process communicates with the child processes using a Unix pipe. The parent dirnanny.client process should never fork a new child process if an existing child process is available to do the monitoring. That is, the number of children of the client are adjusted to the number of directories monitored at the particular host. An increase in the number of directories monitored must trigger the creation of additional children. Similarly, the reduction of the number of directories being monitored, must trigger the termination of some children.
The only time a child process should exit is if the parent process explicitly tells the child to exit with a message on the pipe. The termination of a child is necessary when we reduce the number of monitored directories at a particular host, but also when the whole application is terminating. In the latter case, the parent process tells its children to exit if it has been explicitly told to exit by the server process because the server process has received a SIGINT signal. All terminations at the client+children side are message-induced, and not triggered by signals.
The child process should execute the exact same program as dirnanny.client (i.e., you should not use exec()). The parent communicates with the child process (and vice versa) using Unix pipes. Basically, the child dirnanny.client process executes an infinite loop. It exits from the loop only if the parent tells it to do so. The children of dirnanny.client also execute an infinite loop, allowed only to exit when the parent tells them to. (By the way, the server also executes an infinite loop, allowed only to exit when the user intervenes with a SIGINT.)
While in the loop, the child should be in a position to be assigned, when instructed by the parent to do so, to monitor a different directory (with the corresponding frequency and policy). Note that the server process has to (potentially) manage and communicate with a number of client processes. Also, each client process has to (potentially) manage and communicate with a number of child processes. In addition, each client process has to communicate with the server. Since it is never acceptable to have the server or the clients block when reading data from a socket or pipe, you must use the select() (or poll()) system call to multiplex the I/O. By using select()(or poll()), you can detect whether or not there is actually a message to be read on a given file descriptor. Therefore, when you call read(), you can be sure that you will never block. Again, note that the client has both socket and pipe file descriptors. But, the select() (or poll())system call can deal with both types of IPC mechanisms. In addition, you should also use select() (or poll()) in the child processes.
The following figure is an example of the locations where processes run and the communication between them:
You must write a Makefile for your program. When someone types make, your Makefile should build the executable programs dirnanny.server and dirnanny.client. When someone types make clean, your Makefile should remove the executables, all .o files (if any), and all core files (if any).