CMPUT 379: Operating System Concepts
Department of Computing Science
January 2005

Assignment #1: Directory Nanny

Due Date: Wednesday, February 2, 2005
On-line submission using the astep program before 9 P.M. on due date.

Cleaning Up Your Processes

When using fork() (and related functions) for the first time, it is easy have bugs that leave processes on the system, even when you logout of the workstation. It is your responsibility to clean up (i.e., kill) extraneous processes from your workstation before you logout. Learn how to use the ps and kill (and related) commands.

Marks will be deducted if you leave processes on a workstation after you logout.


Standard Comment About Design Decisions

Although many details about this assignment are given in this description, there are many other design decisions that are left for you to make. In those cases, you should make reasonable design decisions (e.g., that do not contradict what we have said and do not significantly change the purpose of the assignment), document them in your source code, and discuss them in your report. Of course, you may ask questions about this assignment (for example, in the newsgroup) and we may choose to provide more information or provide some clarification. However, the basic requirements of this assignment will not change.

Assignments in this Course:

All three assignments in CMPUT 379 this term will be related to each other. Although it may not be obvious now, you will be using the knowledge (and some code) from this assignment in your future assignments. Therefore, please take the time to understand the concepts and to write clean, readable code.

Also, as a student in this course, it is essential to have a solid understanding of how memory needs to be managed when aids such as Java's garbage collection are not present; operating systems and most systems software do not use garbage collection.

Overview:

Cleaning up directories where temporary files are stored is an increasingly good idea. Many web browsers create temporary cache files to store web page contents. These files tend to stick around between browser runs. Hence, even if the browsers have means to control how much space or for how long their caches are valid, cleaning up the space occupied by the files can only occur while the browser is running. The task of this assignment is to write a new program that takes care of directories where collections of files are placed temporarily. We will assume that these directories contain only regular files (no links, other directories or "special," e.g., device, files).

There are two policies that the program implements:

  1. Restricting the contents of the directory to the most recent files that do not exceed a given size in total (expressed in terms of Kilobytes or Megabytes)
  2. Restricting the contents of the directory to the files created within a given amount of time (i.e., the most recent files up to some point in the past, specified relative to the current time).

Your program must use fork() to create child processes. The name of your executable program must be dirnanny.

Input/Output and Behaviour Specification:

The program dirnanny takes exactly one command-line argument that specifies the full pathname to a configuration file. An example of how the program is started from the command line is:

% dirnanny /tmp/dirnn.config

where the contents of file /tmp/dirnn.config might be, for example:

10:00:00
/tmp/onebigcache 00:01:00 lessthan 2M
/tmp/othercache 00:00:20 mostrecent 03:00:00

There are no empty or comment lines in the configuration file. The first line is a time-expression (defined below) indicating how long dirnanny will run for. You can assume that the configuration files match the format described here and are free of errors.

Each subsequent configuration line contains (a) the absolute path of  the directory to monitor, (b) a time-expression indicating how frequently the corresponding directory is to be checked (c) a keyword (lessthan or mostrecent) to indicate the kind of policy to be enforced on the corresponding directory and (d) a parameter depending on the specific policy, either a time-expression or a size-expression (defined below). All the listed directories are to be monitored. Upon terminating, dirnanny reports the number of files removed during the entire run (over all the directories monitored).

A time-expression is a colon-separated expression of the form hh:mm:ss format where ss is seconds (00 to 59), mm is minutes (00 to 59), hh is hours 00 to 99.  A value of 00:00:00 in a time-expression means that the corresponding directory is checked only once and never again in the future. In our example configuration file, the 10:00:00 indicates that dirnanny should run for 10 hours and then exit; after 10 hours, there should be no dirnanny-related processes (that belong to your user id) on the system.

Finally, a size-expression is an integer followed immediately (i.e., no whitespace) by M or K. The integer can be assumed to be of size suitable to store in an unsigned int. K stands for Kilobytes (1024 bytes), M stands for Megabytes (1024 x 1024 bytes).

In the example configuration, the contents of /tmp/onebigcache are examined every minute and files (in the order from oldest created to most recently created) are removed successively until the directory contains less than two Megabytes of total stored files. The contents of /tmp/othercache are being monitored every twenty seconds and only the files created in the last three hours are preserved. Note that Unix does not keep a strict record of when a file was created. There exists a field (st_ctime) as part of the structure returned by the stat() system call which you should treat as "creation" time, although its meaning somewhat different.

For one user, only one dirnanny parent process and its children may be running at any point in time. When dirnanny first starts its execution, if it finds a previously running dirnanny parent process (and that processes' children)  it must first terminate them (and confirm their termination) before continuing with its own execution.

When a file in a monitored directory is removed, appropriate output is sent to a logfile. As many logfiles as monitored directories are to be created. The location of the logfiles is determined by an environment variable DIRNANNYLOGS. If the environment variable is not set, then the default location  $HOME/.dirnanny/ is used. Logfiles from a new run of dirnanny are free to clobber the previously existing ones. The filenames of the logfiles are dnnylog.X where X is the sequence number corresponding to the order of appearance in the configuration file for the corresponding directory. In our example, directory /tmp/onebigcache is the second line of the configuration file, is the first one declared monitored, and corresponds to logfile dnnylog.1. Directory /tmp/othercache is the third line and corresponds to logfile dnnylog.2, and so forth.

The first line of output in each logfile is a report as to which directory is monitored. For our above example, two logfiles are created: dnnylog.1 and dnnylog.2 where the first one starts with a line of the form:

[Mon Sep 17 11:27:16 MDT 2005]  Info: Initializing monitoring of directory /tmp/onebigcache.

and the second with: 

[Mon Sep 17 11:27:17 MDT 2005] Info: Initializing monitoring of directory /tmp/othercache.

Note that the first item in each line is the current time in date command format (for information regarding this format, check the ctime() function). When actions are taken, a corresponding logfile line is output. For example, if /tmp/onebigcache exceeds the size limit when it is checked again, we will see a line of the following format in dnnylog.1:

[Mon Sep 17 11:27:39 MDT 2005] Action: File "dummy.txt" removed.

Note that the above configuration file is just an example. Your program must work with all possible configuration files that follow the specified format. In fact, as part of your testing, you will want to experiment with different configuration files. 

Note that any output to the logfiles must be of the format date (within square brackets), followed by "type" of message (Info:, Action:, Warning:, Error:, Debug: etc. feel free to extend this set as needed for clarity), followed by the more detailed text.

Stylistic requirement: Students must follow the same format for their messages and diagnostics as the examples shown above.

Required Design:

When the program dirnanny is started from the command line, it must read the configuration file. dirnanny checks all the directories specified in the configuration file. If the corresponding directory does not exist, it is not considered an error. Instead, dirnanny reports to stdout:

dirnanny: Warning: /tmp/othercache does not currently exist.

Then dirnanny will fork off a child process to continue overseeing the (possibly, non-existing) monitored directory. There should be one child process per monitored directory. Note that it does not matter if the directory exists or not (it may get created by the user later), the child process will be started anyway. The only difference is the message of the type seen above to distinguish between already existing and (possibly) later created directories.

The child process will execute the exact same program as dirnanny, that is the children should not invoke exec(). Each child process is now entirely focussed on a single monitored directory. The parent communicates the frequency and name of the  monitored directory to the child using a variable in memory (recall how fork() works). The child puts itself to sleep (see sleep()) for the specified monitoring period. Awakens. Implements the file removal policy. Sleeps again, and the process continues until the termination time specified in the first line of the configuration file. If the monitored directory has been removed, the child process of dirnanny continues checking periodically regardless (the directory may get created and erased an arbitrary number of times).

You must write a Makefile for your program. When someone types make, your Makefile should build the executable program dirnanny. When someone types make clean, your Makefile should remove the executable dirnanny (if any), all .o files (if any), and all core files (if any).

When developing and testing your program, make sure you clean up all of your processes (including dirnanny) before you logout of a workstation. Marks will be deducted for processes left on workstations.

Other Implementation Details:

As appropriate, you must use C memory allocation (e.g., malloc(), free()) and C file I/O functions (e.g., fopen(), fscanf(), fclose()). You cannot use streams or the STL/the C++ stdlib (e.g., cannot use type/class string). Also, your particular TA may not have any expertise in C++ and therefore we cannot guarantee lab support for languages other than C.

It is IMPERATIVE that your program properly deallocates ALL dynamic memory in a correct fashion (i.e., using free()) before your program terminates, or else your assignment will LOSE marks. To check that your program properly allocates and de-allocates ALL dynamic memory it uses, you must use the MEMWATCH package, which is simple to do.

Using MEMWATCH:

This term, we will use Version 2.71 (stable) of MEMWATCH. You can download the package yourself.

The TAs will expect that your files have been compiled with the header file memwatch.h and file memwatch.c in your working directory. MEMWATCH also has a README and FAQ.

In all of your source files (either directly or indirectly), you must add the directive

#include "memwatch.h"

and when you compile, you must compile memwatch.c along with your source file with the variables MEMWATCH and MW_STDIO defined.  As an example:

gcc -Wall -DMEMWATCH -DMW_STDIO main.c memwatch.c

When you run your program, if you get a message in your output that reads something like:

MEMWATCH detected 5 anomalies

it means you have not de-allocated dynamic memory properly.  In particular, this message indicates that 5 allocated structures have not been de-allocated. You should also check the MEMWATCH log file for any reports. If your assignment is not properly compiled with MEMWATCH enabled or if MEMWATCH reports that your memory allocation/deallocation was incorrect, then you will lose marks.

What to hand in:

All elements are to be handed in on-line via the astep program. (Jan. 19/05: astep hyperlink now active.) Use the command:

unix-prompt% astep -c c379 -p as1 submit.tar

All of the following must be packaged into a tar file with the name submit.tar. Information about tar is available from the manual page (see man tar). For example, tar cvf submit.tar Makefile main.c my.h is an archetypal command; be very, very careful of the tar cvf submit.tar part. Before you submit, make sure your tar file works from within a fresh directory.

  1. A README file (ASCII text is fine) for your assignment with: (1) your name, (2) student number, (3) Unix id, (4) lecture section, (5) instructor's name, (6) lab section, and (7) TA's name clearly labelled. Marks will be deducted if any of these items are missing. The README file must also include a short description of your program, as well as a description of the relevant commands to build (e.g. make all) and how to execute your programs including command line parameters.

  2. A report in HTML file format, in a file called report.html, describing the design, implementation, and testing of your assignment. The report should contain no more than 750 words. (If lynx -dump -force_html report.html | wc -w is greater than 800 (i.e., 750 + a small margin; 751 is too many words), then marks will be deducted.) You do not need to repeat any information contained in this assignment description. I recommend you spend 25% of your report on an overview of your assignment, 50% on your design and implementation, and 25% on how you tested your program, and some concluding remarks. Note the emphasis on testing your program.

  3. Your source code file(s) for dirnanny, including all header files. Do NOT submit any MEMWATCH files, as the TA will use his/her own fresh copy of that code, but the use of MEMWATCH should be enabled in your code and Makefile.

  4. Your Makefile.

NOTE: Do not submit files or test data not described above. Only submit what is requested and what is required to compile your program (except, of course, the MEMWATCH files).

Also, make sure that your program does not produce any debugging or extraneous output during normal execution. Only the requested output should be generated. Marks will be deducted for incorrect and other unrequested output. It is acceptable to have output to report an actual error.

Note:  All files in your submission must contain the identification information labelled (1) to (7) in point 1 above (e.g., as a C or Makefile comment).

Marking:

The assignment is worth 12% of your final mark in the course. This is an individual assignment. Do not work in groups. Review the Course Outline (extracted below) on this matter.

The assignment itself will be marked as follows: 20% for your report (clarity, technical accuracy, completeness, thoroughness of the testing, etc.), 60% for the correctness of the program when we test it using CSC 225, using gcc, and 20% for the quality of the implementation (design, modularity, good software engineering, coding style, useful and appropriate comments, etc.).

If your source code, as submitted, does not compile and run (using the submitted Makefile) on the CSC 225 workstations using gcc, you will receive a mark of zero for correctness. Review the Course Outline (extracted below) on this matter.

All that you have learned about good technical communication (e.g., for your report) will apply. All that you have learned about good programming style and comments in your code will apply. Having correct code is important, but good style, design, and documentation are also important. We cannot provide an exhaustive list of what we will look for, but an incomplete list includes: a comment for each source code file, a comment for each procedure/function, a comment for each significant (global or local) variable, good choice of names/identifiers, proper modularity (e.g., do NOT put all/most of the code in main()) etc.

NOTE: There are a number of programs that you can download off the Internet that provide similar functionality to what you are asked to implement for this assignment. We are familiar with them. Therefore, do not download these programs; write your own solution to this problem. Modifying someone else's program (including programs that you can download) is against the requirements of this assignment and is an Academic Offense. If you have any doubts about whether your actions are permissible or not, you should ask a professor before proceeding.

Hints:

You may also want to learn about the following Unix programs: ps, grep, kill

Further hints may be given later on in the newsgroup, if warranted. Be sure to read the newsgroup on a regular basis.

Cleaning up runaway processes is good etiquette when using a shared computer. For this course, it is a necessity. Make sure you know how to use the ps and kill commands.


Important Extracts from the Course Outline:

The University of Alberta is committed to the highest standards of academic integrity and honesty. Students are expected to be familiar with these standards regarding academic honesty and to uphold the policies of the University in this respect. Students are particularly urged to familiarize themselves with the provisions of the Code of Student Behaviour and avoid any behaviour which could potentially result in suspicions of cheating, plagiarism, misrepresentation of facts and/or participation in an offence. Academic dishonesty is a serious offence and can result in suspension or expulsion from the University. (GFC 29 SEP 2003)

NOTE: All assignments must be completed individually. Some high-level discussion of concepts between students is allowed. Do not work in groups. Do not share or discuss specific code in any way with other students; seek help from your TAs or instructor on these matters. Do not post code fragments longer than about 5 lines of code to the newsgroup. Note that we may use automated tools, such as MOSS, to detect potential cases of plagiarism. Note the definition of plagiarism and cheating in the Code of Student Behaviour

VERY IMPORTANT: Your programming assignments, as submitted, must work on the department's laboratory machines in CSC 225 (Linux, uj01 to uj19 ) and with the gcc compiler. All testing will be done on these machines using gcc. We also recommend that you use the gdb debugger. A program that does not work in CSC 225 with gcc, even if it works on a different Unix-like machine (e.g., other versions Linux or BSD) or compiler, will be considered incorrect. It is your responsibility to double check your tar files on the lab workstations before submitting them. Any mistakes in the above procedures, Makefiles, missing files, improper pathnames, and ``last minute changes'' to the files that prevent proper compilation will result in a mark of zero for correctness (approximately 60% of the total marks for each assignment). If you find an error in your submission, you can use the Late Policy (see below) to correct the mistake.

LATE POLICY: All programming assignments must be submitted electronically before 9 P.M. on the due date. (Note that even 1 second past 9 P.M. will be considered late.) Though not advised, it is possible to submit assignments late, with a penalty. The penalty for being late 1 day (i.e., up to 24 hours) is 10% of the maximum possible mark. Similarly, the late penalty for 2 days (i.e., more than 24 and up to 48 hours) is 20% of the maximum possible mark. No assignments will be accepted after 2 days past the deadline, except under extraordinary conditions and only with the approval of the instructor in advance.