CMPUT605

Programming Multi-core Architectures

 

Quadro_D_Series_thumb.jpg

Professor

Pierre Boulanger

Class Time

Tuesday, 14:00 -17:00

Room Number

CSC 363

Office Hours

By appointment

Course Number

CMPUT605

 

 

Course Description:

 

For two decades, software developers enjoyed an increase in application performance due to a doubling of transistors in processors about every 2 years, in accordance with Moore's Law. Much of this performance gain resulted from increasing clock speeds. However, due to temperature concerns, this is no longer possible. The industry now agree that the future of architecture designs lies in multi-cores, i.e., processors with simpler cores running at lower frequencies. As a consequence, all computer systems today, from embedded devices to high-end servers, are being built with multi-core processors. Thus software developers can no longer sit idly and wait for application performance to improve. In fact, application performance is likely to degrade given future generations of multi-cores with ever more simple cores.

Although researchers in industry and academia are exploring many different multi-cores hardware design choices, most agree that software for execution on multi-core processors is the major unsolved problem. Unlike earlier generations of hardware evolution, this shift will have a major impact on how software is designed and developed. Developers will have to learn how to properly design their applications to utilize multi-cores parallelism. Opportunities to address the problem span multiple levels of the software stack. This course will focus on the entire spectrum of the software stack as it applies to multi-core architectures, including libraries, tools, programming languages, compilers, runtime systems, and operating systems.

The first part of the course will consist of discussions about current multi-core architectures and parallel programming models. The second part of the course will involve discussing several research papers (chosen by students) pertaining to multi-cores. Each student will be asked to present one or two papers to the class. The third and last part of the course will consist of presentations by students of programming projects. Students will have a variety of multi-core architectures available for their class projects, including SGI Altix 4700 and Intel Quad-core systems accelerated with GPUs Tesla.

 

Lectures

Slides

Extra Reading

September 15: The Multi-core Revolution 

Slides (PPT)

Read this paper or watch the Single-Threaded vs. Multi-Threaded video.

New: Take a look at the FERMI GPU

 

September 22: A Basic Element  of Parallelism 

Slides (PPT)

Read Chaps. 1-3 and 5-7 from this paper.

 

September 29: Some Theory for Parallel Programming

Slides (PPT)

Read Chaps. 1-3 from this paper and all of this paper.

Read this paper.

 

October 6: Nvidia CUDA Programming Basics I

Slides (PPT)

 N-body Problem in CUDA paper

CUDA Language Definition

CUDA Compiler Docomentation

Matrix Multiplication Source Code

CUDA Zone

CUDA Programming Model

 

October 13: Nvidia CUDA Programming Basics II

Slides (PPT)

Paper on Sunviz

Programming GPU Book

New: Info on Intel Web Cast

CUDA Threads Reference

 

October 20:  Nvidia CUDA Programming Basics III

Slides (PPT)

CUDA Memory Model

CUDA Performance Consideration

GPU Technology Conference 2009

 

November 5: OpenCL

Slides (PPT)

OpenCL Website

Intro to OpenCL

Recent (2009) OpenCL Talk

How to Compile CUDA in Visual Studio?

Using Nexus with CUDA

Nexus Developer Site

 

November 13: OpenCL

Slides (PPT)

OpenCL Specification 1.0.48

OpenCL API

 

November 26:

GPU Implementation of Smoothed Particle Hydrodynamics (SPH)

Slides (PPT)

 Presented By: Sapphire

Smoothed Particle Hydrodynamics on GPU’s

 

December 1:

MPI and Intel Parallel Studio

Slides (PPT)

Talk on Parallel Studio

Intel Presentation:

·         Image Processing Notes

·         Image Processing: Stop Developing Code From Scratch

·         The Key to Scaling Applications for  Multicore

·         Parallel Studio Installation Guide

·         Parallel Studio Website

·         Parallel Studio Tutorial

 

December 8:

Project Reports

Slides (PPT)

Parallel Implementation of SPH

 

Slides (PDF)

 Final Project Report

 

 

References:

 

  1. Grama, Gupta, Karypis, and Kumar, Introduction to Parallel Computing
  2. Mattson, Sanders, and Massingill, Patterns for Parallel Programming

 

Exams:

 

          No Midterm/Final.

 

Grading:

 

  1. Paper presentation (25%)
  2. Class Participation (15%)
  3. Projects (60%)

 

Webpage:

 

                  http://www.cs.ualberta.ca/~pierreb/Multicore-2009

 

 

Background:

 

Parallel programming and computer architecture background helpful, but not required. A reasonable familiarity of the C programming language is necessary. You might have to know some C++ depending on the programs you choose for your project.

 

Objectives of the course:

 

This course is intended to give students an understanding of multi-core architectures and parallel programming models. Student will get an appreciation of the problems and solutions researchers have identified in the field of multi-cores. Also, students will get experience in writing critical paper reviews and in presenting research. Finally, students will get a thorough understanding of how to write parallel programs for current multi-core architectures.

 

Projects:

Most of your grade will come from two projects that will total 60% of your grade. Both projects are mandatory. Plan on check pointing your project with the instructor (showing your progress) at various stages during your project. You must checkpoint at least once for the first project and at least twice for the second part of the project. These checkpoints will be part of your grade. You are not required to stay with the same group for both projects. Check the syllabus for more details.

 

 

Class Resources:

mci-logo2.jpg

Multicore/Parallel

MPI Specification
LLNL pthreads tutorial
An Introduction to Programming with Threads
Thinking Parallel Blog

Cell Specific

Cell Full-Day Workshop (Slides and Video)
IBM Journal Issue on Cell
Cell Developer's Corner
Cell Articles and Useful Links
CorePy: Python package for Cell
Supercomputer at LANL using Cell

GPU Specific

General-Purpose Computation Using Graphics Hardware
AMD/ATI Stream Computing Resources
NVIDIA Developer Zone
BrookGPU