UofA | Computing Science | Semester 2004-1 |
Cross Industry Standard Process for Data Mining
(Independent Study)
Instructor: Osmar R. Zaļane
|
OBJECTIVE/DESCRIPTION:
Some industry leaders in data analysis have developed their own
work-flow models for the development and the integration of data
mining projects. A consortium of industry partners took the initiative
to develop an industry- and tool-neutral Data Mining process model
called CRISP-DM. This model, currently in version 1.0, is supposed to
help validate data mining process and accelerate the development of
large scale data mining investigations.
This
course will provide the students with (1) an overview the data mining
process model CRISP-DM, (2) an opportunity to thoroughly study and compare
existing tools that support this model and compare them with existing
tools that do not provide any process model and process integration,
and (3) an opportunity to design a data miing process integartion tool
based on existing data mining building blocks develloped at the
University of Alberta. The students will be provided with enough background so
that a term project (prototype) can be developed.
The
course will mainly consist of a series of discussions on the topics
listed below as a general
guideline. Throughout the course recent relevant research papers
will also be read/discussed.
TOPICS:
The course will cover the following topics:
- Process for Data Mining
- CRISP-DM methodology
- Industry Standards for Data Mining
- Work flow models for data mining
- integrated software/tools for data mining
- Issues with small to large data mining integration investigations
- implementation issues for integrated data mining tools
GRADING:
Annotated Bibliography (20%),
Discussions (20%),
Implementation and testing (20%)
Final Term paper (40%).
TEXTBOOK and REFERENCES:
- A minimum of 10 research papers will be selected from a variety of
journals, conference proceedings and other sources.
Distributed: January, 2004