CMPUT 497: Cluster Challenge and Computational Science

Paul Lu
Department of Computing Science
January 2009

Course Project
Due Date: Tuesday, April 7, 2009, in class

Description:

This project can either be done individually, or as a group of no more than 2 students (if approved in advanced by the instructor), and is worth 50% of your mark in the course.

Requests for approval of group work must be made in writing (e.g., email) before March 13, 2009. The substance required of a group project will be larger than for individual projects and will be proportional to the number of members in the group.

It is strongly recommended that you discuss your project plans with the instructor before March 13, 2009 as well.

The purpose of the project is for you to learn, in greater depth, about an aspect of computational science or cluster computing. A project should include some aspects of:

  1. Installing, configuring, running, and evaluating some application or tool related to computational science or clusters.
  2. and/or

  3. Writing or customizing some code related to computational science or clusters.

There are three parts to this project: reading, hands-on work, and reporting.

  1. Reading: Choose any computational science application, or any significant tool or systems software used for clusters. You may choose to follow-up on a choice that you made for an earlier assignment, or you may choose a new topic.

    Find between 1 and 3 new (i.e., not used in your previous assignment(s)) and substantial articles on your chosen topic. Of course, you may refer to articles previously used in an assignment, but you must find new articles as well. The best articles include research papers (from academic conferences and journals; see list at end of Course Outline) and articles from science-oriented magazines such as Scientific American and IEEE Computer.

    Read and understand your articles. NOTE: You will have to hand in copies of your articles to the instructor.

  2. Hands-On Work: The purpose of hands-on work is to learn about things that cannot be learned by just reading. Just how difficult, or frustrating, or exciting it is to actually see an application generate scientifically relevant output, or systems software provide functionality that did not exist before, cannot be conveyed easily on paper.

    A well-designed hands-on component goes beyond the simple usage described in the application documentation, and begins to explore computational science and how well (or not) it is suited for cluster computing.

    Some high-level ideas for possible hands-on work include (you would do one of the following):

    1. Computational Science: Choose a well-known scientific application. Compile, install, and learn how to run it. For some applications, this is relatively straightforward; for others, it can be very difficult (see Hint below). Try the application with new input data sets (e.g., learn how to create your own input, or find input data that did not come with the application itself).

      Understand (and explain in your report) the science behind the application and how to interpret the output.

    2. Visualization: Choose a well-known scientific application. Compile, install, and learn how to run it. For some tools, this is relatively straightforward; for others, it can be very difficult (see Hint below). Learn about tools that help visualize the output.

      Understand (and explain in your report) the visualization and explain how the visualization helps in understanding the nature of the science and application

    3. Using tools: There are a variety tools for clusters that you can download and use (e.g., programming tools, batch schedulers, management systems, monitoring systems).

      Choose a well-known tool. Compile, install, and learn how to run it. For some tools, this is relatively straightforward; for others, it can be very difficult (see Hint below). Design and explore some use-case scenarios for the tool, implement them, and learn about how the tools helps in those situations. Depending on the tool, you might want to write some new code or new module.

      Understand (and explain in your report) the purpose, design, and (subjective) usefulness of the tool.

    4. New/modified tools: Do you think there is a need for a new tool (or a modified version of an existing tool)? Design, implement, and evaluate your new tool. It is strongly recommended that you discuss your ideas with the instructor at several points during your work. The instructor might be able to point out related work to your idea, help you refine your ideas, and help avoid potential pitfalls.

      Understand (and explain in your report) the purpose, design, and (subjective) usefulness of the new tool. What are the advantages and disadvantages of your new tool, as compared to existing tools?

  3. Reporting: Write a 5 (if text heavy) to 10-page (if you have many figures or graphical elements; see next sentence) report (1 inch margins, at least 12 point font, single-spaced) on the topic and what you did for your hands-on work. Include, as appropriate, screenshots of the application, sample output, visualizations, etc. generated form your hands-on work.

    The most important aspect of the report is to convey what you learned via hands-on work, that likely is not as obvious from just reading about the application or tool. Of course, you must clearly and effectively summarize relevant information from the article(s) (and program documentation), but the report is primarily about your hands-on work.

    Be sure to use proper citation and referencing techniques (any academic style of citation is acceptable). Be aware of the Code of Student Behaviour and it how applies to referencing source material.

What to hand in:

On the due date, hand in via paper copy and email of your electronic files, your report.

Also, hand in copies of the articles that you used.

Marking:

The project is worth 50% of your final mark in the course. Unless you have been approved in advance to work in groups, this is an individual project. You may discuss the project with other students, but individual projects must be all your own work.

70% of the marks for the project will be for the report itself (how well written, how well it conveys the lessons of the hands-on work, how well it explains the science and design behind the application or tool).

30% of the marks for the project will be for the hands-on work in terms of how well it is designed and executed. A well-designed hands-on component goes beyond the simple usage described in the application documentation, and begins to explore computational science and how well (or not) it is suited for cluster computing.

Suggestions and Hints:

  1. Many applications (and tools) compile and build easily under Linux (because someone has already made the effort to port it to Linux) and may take more effort when using other flavours of Unix, or non-Unix operating systems. If you run into any difficulties, I strongly recommend you switch to a Linux environment. You can use the ISG lab environment, or get a Linux-based VMware image from me.
  2. Web pages found by Google and Wikipedia are good places to start searching for information. But, they are not reliable enough (by themselves), so seek out actual papers or articles.
  3. Suggestions for topics: See the assignments, but also...
    1. A variety of tools exist to visualize protein data files (e.g., the Protein Data Bank (PDF)). Explore the visualization of proteins.
    2. GAMESS is a well-known ab initio chemistry code.
    3. GROMACS is a well-known molecular dynamics code.
    4. QuantLib is a free/open-source library for quantitative finance, and may be used in the Cluster Challenge 2009.
    5. Sun Grid Engine (SGE) is a well-known batch scheduler for clusters.
    6. OpenPBS is another well-known batch scheduler for clusters.