UofAComputing ScienceSemester 2004-1

Cross Industry Standard Process for Data Mining
(Independent Study)
Instructor: Osmar R. Zaļane

OBJECTIVE/DESCRIPTION:

Some industry leaders in data analysis have developed their own work-flow models for the development and the integration of data mining projects. A consortium of industry partners took the initiative to develop an industry- and tool-neutral Data Mining process model called CRISP-DM. This model, currently in version 1.0, is supposed to help validate data mining process and accelerate the development of large scale data mining investigations.

This course will provide the students with (1) an overview the data mining process model CRISP-DM, (2) an opportunity to thoroughly study and compare existing tools that support this model and compare them with existing tools that do not provide any process model and process integration, and (3) an opportunity to design a data miing process integartion tool based on existing data mining building blocks develloped at the University of Alberta. The students will be provided with enough background so that a term project (prototype) can be developed.

The course will mainly consist of a series of discussions on the topics listed below as a general guideline. Throughout the course recent relevant research papers will also be read/discussed.

TOPICS:

The course will cover the following topics:

GRADING:

Annotated Bibliography (20%),
Discussions (20%),
Implementation and testing (20%)
Final Term paper (40%).

TEXTBOOK and REFERENCES:


Distributed: January, 2004