CMPUT 681: Parallel and Distributed Systems

Paul Lu
Dept. of Computing Science
September 2009

Course Project

Due Date: Friday, December 11, 2009, 5 PM
Due Date: Thursday, December 3, 2009, 9 PM
Proposals Due: Monday, November 2, 2009, in class

CMPUT 681 Possible Project ideas:

  1. Benchmark the Phoenix (MapReduce) system using the default applications, on different systems and OSes.
    1. Could also include porting some applications from Hadoop/Java
    2. Could also include writing new applications in Phoenix/MapReduce
  2. Use Hadoop to implement a new algorithm/application
  3. Experiment with Hadoop on Amazon.com's Elastic Computing Cloud (EC2)
  4. Port the Phoenix (MapReduce) system to use Cam Macdonell's shared-memory mechanism for virtual machines.
  5. (Oct. 16/09) Find or implement the same algorithm using, say, Hadoop/Phoenix/MapReduce and MPI. Compare the performance and ease of programming.

CMPUT 497 Project:

(Nov. 2, 2009) C681 students can ignore this section. The project for C497 students has the same overall goals and themes as for C681 students, but I am specifying the actual things to do. In particular, the C497 students should:

  1. Download, install, and get running the C-based Phoenix system for MapReduce calculations. See URL: http://mapreduce.stanford.edu/
  2. You should use a Linux-based system with at least 2 cores, but 4 cores or more would be more interesting. If students want help in accessing a system with more than 2 cores, they should contact the instructor directly ASAP.
  3. Download, install, and benchmark all of the applications described in the Phoenix Rebirth: Scalable MapReduce on a Large-Scale Shared-Memory System paper, discussed in class. The goal here is to reproduce / re-run a meaningful subset of the experiments described in that paper. You can define and develop the "meaningful subset of the experiments", in consultation with the instructor.
  4. Port from Java/Hadoop to C/Phoenix the Weather application, as described by Cam Macdonell in class on Oct. 30 and as described in the book Hadoop: The Definitive Guide by Tom White (O'Reilly and Associates). The book is available in print form and on-line from a U of Alberta computer via Safari Books Online .
  5. Design and plan some experiments using the C/Phoenix version of the Weather application. In addition to the baseline calculation from the book, you should write new code to compute some other calculations. Some examples to consider include:
    1. Average maximum (and/or minimum) temperatures for a given day (e.g., day 128 of 366) across all years of the timeline of the data.
    2. Average 5-day-window maximum (and/or minimum) temperatures across the timeline of the data.
    3. Average values for any of the data points in the data set.
  6. Write and hand-in a 5 page report on all of your work with Phoenix, the existing Phoenix applications, and the newly ported Weather application with Phoenix.

    Graphs and tables do not count towards the 5 page limit. C/Phoenix source code for the Weather application should be included, but do not count towards your 5 page limit.

  7. After the project reports have been handed in, each student will be asked to do a demonstration of Phoenix running, with the applications from the Phoenix Rebirth paper, and with the Weather application (described above).

Purpose and Overview:

Unlike the assignments, the purpose of the project is for you to take the initiative and investigate a topic in depth. Therefore, the project is less structured than the assignments and it can take many forms.

For those considering a thesis in parallel and distributed systems or a closely-related area, a wise thing to do is to design a project that can be expanded into a thesis. I recommend that you (and all students) discuss your ideas with me on a regular basis and take advantage of my open-door office policy.

As noted on the course web page, the course project is supposed to be done as a group project involving 4 to 6 students. If you want to do a project with fewer than 4 students (e.g., individual project) or more than 6, then you should contact the instructor first.

NOTE: The work on the project is to be done concurrently with the assignments and take-home exam.

Types of Projects:

As with the assignments, a project should have three main parts: implementation, experimentation, and exposition.

Some possible types of projects include:

  1. Design and implement a new parallel algorithm for a problem that you find interesting. The algorithm can be from any domain, such as scientific computation, computer graphics, graph theory, etc.
  2. Implement and evaluate a known parallel algorithm from the literature that you find interesting.
  3. Port and evaluate an existing parallel program (say, from shared memory to distributed memory, or from one parallel programming system to another).

Over the next few classes, specific project ideas will be discussed. However, I strongly encourage you to consider your own project ideas.

Implementation Issues:

You may use any parallel programming system, although I highly recommend that you use something based on C, C++, or Java.

Experimentation Issues:

Once you have the basic algorithm(s) implemented, you should design some experiments to explore and evaluate the design and parameter space of the algorithms. We have read/will read a number of papers that demonstrate the kinds of quantitative evaluation that can be performed.

Exposition Issues (i.e., Your Report):

This should be a well-written, clear, and concise description of your implementations and experiments. There is no minimum or maximum number of pages, but I think 10 to 20 pages is a good target length.

There should be a concise literature review or discussion of related work. This section of your report should be only about 25% of the total length.

As with the assignments, use appropriate graphs, figures, and source code listings. For example:

  1. Speedup graphs
  2. Breakdown of execution times according to phases
  3. Effect of varying the input data size
  4. Effect of varying the work partitioning strategy
The report is not just a description of your implementation. The main purpose is to communicate the lessons learned from your hands-on experience with the implementations and experimentation. Think of this as a mini-paper where you have a thesis (i.e., a point that you would like to make) and you present data and justification for your thesis.

How to Get Started:

If you are interested in developing a new parallel algorithm, you can start by deciding what area of computing science interests you and if it can benefit from parallelism. For example, you may have encountered a problem or sequential algorithm from one of your other courses or from your readings of the literature.

If you are interested in working with existing algorithms or existing programs, you can start by reading the relevant literature. The papers that we read in class are just the starting point. You can read the papers that they reference. You can also browse the latest papers from relevant conferences and journals (see the links on the course web page).

Of course, our textbook covers a lot of different problems and algorithms. Each chapter has a list of references that you can use to guide your reading of the literature.

If you are confused or finding it hard to decide on a topic, come see me sooner rather than later.

Intermediate Milestones:

  1. Due Monday, November 2, in class: A 2 page Project Proposal. The proposal should clearly state the problem that you intend to examine and it should discuss your proposed implementation and experimental methodology at a high level. A list of your project milestones and a timeline (e.g., when you expect to have the implementation completed) should be included.

    I may ask you to meet with me and/or revise this proposal if I feel that it is necessary.

  2. I must approve this proposal before you hand in your final project.

What to hand in:

  1. A title page for your project with: (1) your name(s), (2) student number(s), (3) Unix id(s).
  2. A 10 to 20 page typed report (single spaced, 11 point font, all margins at least 1 inch). Use as many relevant figures, graphs, and diagrams as necessary.
  3. A printout of the interesting parts of your source code, Makefile, and any relevant test data on paper. Note that there is no need to send me your source code electronically. If I want the source code, I will ask for it.

Marking:

The project is worth 50% of your final mark in the course.