CMPUT 695: Principales KDD (Assignment4)

Assignment 4

CMPUT 695 (Fall 2004)

Due Date: See table below (one day before the presentation date at 17:00 (by e-mail)
Percentage overall grade: 5%
Penalties: 20% off a day for late assignments
Maximum Marks: 10

One of the major activities in this course is that each student is expected to read and present one paper from the provided research literature. The other students still have to read the paper to better understand and follow during the presentation and hopefully have a discussion after the presentation. Only the designated presenter, however, is to prepare slides and a report underlining a review of the paper.
As a forth assignment, students are required to prepare a report with the review of an additional paper that they are not assigned to present in class. The review is to be handed in the day before the presentation of the paper in question.
The review should be about 2 pages (maximum 5) and should be written as if you were reviewing a journal article or a paper submitted to a conference program committee.
The review should contain at least these sections:
1-Brief summary of the main contributions of the paper
2-Elaboration on the positive aspects presented in the paper
3-Elaboration on the negative aspects presented in the paper
4-Comments on how to improve the ideas/issues/experiments presented in the paper.
The list of papers, the assigned students and the deadlines for the assignment are as follows:

Deadline Paper Student

October 27, 2004 Closet+: Searching for the best strategies for mining frequent closed itemsets, J.Wang, J. Han, and J. Pei, Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD),Washington, DC, USA, 2003. Haobin Li

October 27, 2004 CHARM: An Efficient Algorithm for Closed Itemset Mining , M. Zaki, and C-J Hsiao, SIAM SDM 2002, Arlington, VA, April 2002. Jon Klippenstein

Novemver 8, 2004 Efficiently mining long patterns from databases, R. J. Bayardo, ACM SIGMOD international conference on Management of data, 1998. Leila Homaeian

Novemver 8, 2004 Mafia: A maximal frequent itemset algorithm for transactional databases , D. Burdick, M. Calimlim, and J. Gehrke, 17th International Conference on Data Engineering (ICDE), April 2001.
See also:
MAFIA: A Performance Study of Mining Maximal Frequent Itemsets, Doug Burdick, Manuel Calimlim, Jason Flannick, Johannes Gehrke, and Tomi Yiu, Workshop on Frequent Itemset Mining Implementations (FIMI'03). Melbourne, Florida, November 2003.
Hongqin Fan

November 15, 2004 Dualminer: A dual-pruning algorithm for itemsets with constraints , C. Bucila, J. Gehrke, D. Kifer, and W. White, Data Mining and Knowledge Discovery, Vol. 7, Issue 4, July 2003, pages 241-272 Jessica Enright

November 15, 2004 Constrained Frequent Pattern Mining: A Pattern-Growth View , J. Han and J. Pei, ACM SIGKDD Explorations (Special Issue on Constraints in Data Mining), June 2002. John Sheldon

November 17, 2004 Mining Sequential Patterns , R. Agrawal, R. Srikant, International Conference on Data Engineering (ICDE), 1995. Paul Nalos

November 17, 2004 PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth , J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M. Hsu, 17th International Conference on Data Engineering (ICDE), April 2001. Rafal Rak
November 22, 2004 A Robust Outlier Detection Scheme in Large Data Sets , J. Tang, Z. Chen, A. Fu , D. Cheung, the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining, (PAKDD), Taipei, 6-8 May, 2002. Wojciech Stach

November 22, 2004 LOF: Identifying Density-Based Local Outliers , M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander, ACM SIGMOD Int. Conf. on Management of Data, 2000. Yunping Wang

November 24, 2004 Mining Top-n Local Outliers in Large Databases , W. Jin, K.H. Tung and J. Han, ACM SIGKDD 2001, San Jose, California, Aug. 2001. Ben Chu

November 25, 2004 Rainforest - a framework for fast decision tree construction of large datasets , J. Gehrke, R. Ramakrishnan and V. Ganti, Proc. Very Large DataBases (VLDB), 1998. Dean Cheng

November 29, 2004 Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces, , J.Zhou and J. Sander, Conf. on Very Large DataBases (VLDB), 2003. Junfeng Wu

November 29, 2004 Privacy-Preserving Data Mining , R. Agrawal and R. Srikant, ACM SIGMOD 2000, Dallas, May 2000. Nasimeh Asgarian

Deliverables:
This assignment is to be submitted via email. Send one pdf file containing your report. A postscript of MS word file is also acceptable, but PDF is prefered. The report is to be 2 pages long (maximum 5).

Posted on Oct 19. Last updated (Oct. 19, 2004 - 22:00)