CMPUT 681: Parallel and Distributed Systems

Paul Lu
Department of Computing Science
Fall 2009

Assignment 1: Shared-Memory Programming: Matrix Multiplication

Due Date: Wednesday, October 28, 2009, in class

Overview:

There are three parts to this assignment: implementation, experimentation, and exposition.

  1. Implement parallel matrix multiplication using either Java (and Java threads) or shared-memory pthreads (POSIX threads) library on any Linux or Unix-based system of your choice. You should use either C, C++, or Java. Specifically, you need to implement basic, non-blocked, dense matrix multiplication.
  2. Experiment with the basic algorithm to explore different design choices in work partitioning, etc. and different parameters such as input data, granularity, etc. For each design choice or parameter, you should evaluate how it impacts performance. I recommend that you try matrices of up to 1,024 by 1,024 integers (or even larger, depending the speed and memory size of your machine) in size.
  3. Describe your implementations and experiments in a 3 page report (excluding any graphs, diagrams, or source code listings). The goal is to present a coherent and insightful description of the lessons learned from your experimentation.

NOTE: This assignment is largely the same as the one given to my class in previous years. Since this is a graduate course, I have complete confidence that you will not seek out a solution from a previous student of the course. Please do not compromise this learning opportunity (and commit an Academic Offense) by not doing the work by yourself!

Details on Implementation:

Again, you should implement matrix multiplication using shared-memory pthreads or Java threads.

When debugging your programs, you will likely create a lot of runaway processes. Learn about ps! Be sure to clean up your threads and processes!

If you search hard enough, you will likely be able to find existing implementations of matrix multiplication on the Internet. Please do the implementation yourself in order to get the hands-on experience.

HINT: You may find the pthread_setconcurrency() function to be useful to achieve maximum speedup.

Details on Experimentation:


NOTE: When testing your code, do not use input files for test data. Instead, randomly generate the data within the program.

If you use the same seed value (i.e., srandom()) for each run, then random() will deterministically generate the exact same sequence of pseudo-random numbers. Notice that srandom() returns a value of type long int. NOTE: Do not use srand() and rand(). The range of values generated is considerably lower than from srandom() and random().

And, by generating the data within your program, it will run faster.


Once you have the basic algorithm implemented, you should design some experiments to explore and evaluate the design and parameter space of the algorithm.

Rough: Use the average of 5 runs. Run for 7, use last 5.

For example, for matrix multiplication:

  1. Try different work partitioning strategies: row-wise, blocked row-wise, blocked, etc. You should also try at least one strategy that requires you to use some kind of locking.

    Although some of these strategies will be ``obviously'' bad, the goal is to experimentally quantify the performance of the different strategies.
  2. Vary the size of the matrices. I recommend that you try matrices of up to 1,024 by 1,024 integers in size.
NOTE: You should use the Unix function gettimeofday() or something similar to measure time.

Details on Exposition (i.e., Your Report):

This should be a well-written, clear, and concise description of your implementations and experiments. Use appropriate graphs, figures, and source code listings. For example:

  1. Speedup graphs
  2. As appropriate, breakdown of execution times according to phases
  3. Effect of varying the input data size
  4. Effect of varying the work partitioning strategy
The report is not just a description of your implementation. The main purpose is to communicate the lessons learned from your hands-on experience with the implementations and experimentation. Think of this as a mini-paper where you have a thesis (i.e., a point that you would like to make) and you present data and justification for your thesis.

What to hand in:

  1. A title page for your assignment with: (1) your name, (2) student number, (3) Unix id.
  2. A 3 page typed report (single spaced, 11 point font, all margins at least 1 inch). Figures, graphs, and diagrams do not count as part of the 3 pages.
  3. A printout of your source code, Makefile, and any relevant test data on paper. Note that there is no need to send me your source code electronically. If I want the source code, I will ask for it.

Marking:

The assignment is worth 20% of your final mark in the course. This is an individual assignment. Do not work in groups. Do not share source code. However, you may freely discuss this assignment at a ``high level'' with your fellow graduate students.

Other comments (based on student assignments from previous years):

  1. Be careful about speculating and giving theories as to why performance is poor (or good) without some evidence. Wild, unsupported (and wrong) theories undermine the writer's credibility.
  2. Be careful of poor implementations of sequential matrix multiplication; they tend to make your speedups look overly high. Complicated array indexing and dereferencing can really slow down the code.
  3. How did you validate the correctness of your sequential and parallel codes?
  4. Use at least -O compiler optimization when benchmarking. This reduces the chance that a silly bit of code (which a compiler could optimize out) can distort the timings. You should never publish (or put in your thesis) numbers generated with -g specified to the compiler.
  5. Give me speedups graphs!!! (In addition to other kinds of graphs.)
  6. Include listings of your souce code!!!

Useful Resources

  1. Your textbook.
  2. See the various links at the bottom of the course outline page.