School of Information Technologies


 

COMP5318 KNOWLEDGE DISCOVERY AND DATA MINING
Semester 1, 2011

Outline

eLearning

Timetable

 Assessment 

Syllabus

Resources

 

News

28/2/2011

Watch out for news here

28/2/2011

Welcome to COMP5318!

28/4/2011

The marks for assignment 1 can be found here.

30/5/2011

Assignment 2 will be available on Wednesday June 1st for collection at the reception.

20/6/2011

The marks for assignment 2 can be found here.



Course outline

This course will offer a comprehensive coverage of well known Data Mining topics including classification, clustering, and association rules. A number of specific algorithms and techniques under each category will be discussed. Methods for feature selection, dimensionality reduction and performance evaluation will also be covered. Students will learn and work with appropriate software tools and packages in the laboratory. They will be exposed to relevant Data Mining research.

Teaching staff

 


Sanjay Chawla and Fabio Ramos -lecturers
Email: {sanjay.chawla|fabio.ramos} AT sydney.edu.au; School of IT Building
Consultation time: Monday 4-5pm

 

Khoa Nguyen - tutor
Email: khoa at it.usyd.edu.au

Lionel Ott - tutor
Email: lott4241 at usyd.edu.au


Timetable

Activity

Day

Time

Venue

Lectures

Monday

6-8pm

Carslaw lecture theatre 175

Laboratory/Tutorial
(start in Week 2)

Monday

8-9pm

Carslaw labs 202A, 202B and 202C

 

Assessment overview

Assignement

%

Marks

Due

Individual/Group

Notes

Late submission policy

Ass1: Test

15

Marks

w6, in class

Individual

In the 1st first hour of the lectures (6-7pm). Semi-open as the exam. Students are allowed  1 sheet of their own notes (A4-size, double-sided, handwritten or typed). The test will cover the material till week 5.

Not possible to re-sit the test.

Ass2: Data Analysis

20

w9, Friday 5pm

Individual or in pairs (groups of more than 2 people are not allowed)

Submission: 1) hard copy in the locker labelled COMP5318 located in the School of IT Building, level 1, in the postgraduate labs wing and  2) electronically via webCT

- A penalty of 1 mark per each day after the deadline will apply
- Assignments will not be accepted if the delay is more than 7 days

Ass3:
Research paper presentation

 final schedule

15

 w12 and 13, in class

Group

- No late presentations are allowed; a student who is unable to present on the specified date will receive 0 marks for this assessment

Written exam

50

examination period

Individual

The exam will be semi-open. You are allowed 1  sheet of  your own notes (hand-written or typed, double-sided, A4-size) and a non-programable calculator (you don't need a calculator). No other material is allowed (no book, no additional notes). The exam will be on all material covered in the review slidesexcept Clustering.


In order to pass the course, the School requires at least 40% in the written exam, at least 40% in the other assessment components together and an overall final mark of 50 or more. This means that students who score less than 40% in the exam will fail the course regardless of their marks during the semester.

About plagiarism: please do not confuse legitimate co-operation and cheating.
Plagiarism will be dealt with according to the University procedures. The following documents must be used as cover sheets on
all work (assignments, presentations, etc.) submitted for individual or group assessment:

 

 

Syllabus

  The teaching materials (lecture notes, lab notes, lab solutions and assignment specifications) will be available on the eLearning site.

Week

Date

Topic

1

28th Feb

Admin matters. Assumed Knowledge Check. Introduction to Data Mining (DM); challenges, origins, DM vs Machine Learning and Knowledge Discovery in Databases; DM tasks.

Data: types, cleaning (noise, missing values), pre-processing (aggregation, feature selection, discretization and binarization, normalization), similarity measures.

2

7 March

Data
An overview of datatypes and data transformations
Tutorial

3

14 March

Association rules 1: Introduction, Mining frequent items, Apriori algorithm
Tutorial

4

21 March

Association Rules Continued and Basics of Probability
Tutorial .

5

28 March

Clustering: Kmeans and hierarchical clustering
Clustering; Tutorial

6

4 April

Ass1: Test
Soft K-means and Expectation Maximisation

EM
Tutorial

7

11 April

EM and Dimensionality Reduction:

Lecture Tutorial

8

18 April

Classification 1:
k-nearest neighbours, naïve Bayes, evaluating a classifier.
Lecture Tutorial
Assignment 2 available here.

-

25 April

Easter Holiday

9

2 May

Classification 2:
Discriminative classifiers: Logistic regression, Support vector machines (SVM); maximum margin hyperplane, finding a maximum margin hyperplane as an optimisation problem, linear SVM with hard and soft margin, nonlinear SVM (kernel trick and Mercer’s theorem).
Lecture Tutorial
Ass2: Data analysis due Friday, 6 of May, 5pm

10

9 May

Outlier detection
Lecture   Tutorial

11

16 May

Covariance Matrix Applications    Example of Pruning Technique   Tutorial

12

23 May

Ass3: Student presentations of research papers.

13

30 May

Ass3: Student presentations of research papers.
Review of course material
The report is due for submission on Friday June 3rd by 5pm from the Blackboard website

 

 

Resources

Textbook

Introduction to Data Mining
Pang-Ning Tan, Michael Steinbach, Vipin Kumar,
Pearson Education (Addison Wesley), 0-321-32136-7, 2006

Chapters 4, 6 and 8 are freely available here and from the publisher.

Recommended books

Data mining - practical machine learning tools and techniques with Java implementations, 2d edition
Ian H. Witten, Eibe Frank
Morgan Kaufmann. 0-12-088407-0, 2005

Machine Learning view of Data Mining. Very readable. The book of the WEKA library.

Data Mining: Introductory and Advanced Topics
Margaret Dunham, Prentice Hall, 0-13088892-3, 2003

Good coverage of the topics included in the course. Very readable. Pseudo code and computation complexity covered.

Data Mining Concepts and Techniques
 J. Han and M. Kamber
Morgan Kaufmann, 2006, ISBN 1-55860-901-6 

Database view of Data Mining.

Principles of Data Mining
D. Hand, H. Mannila, P. Smyth, Principles of data mining,
MIT Press, 2001, ISBN: 0-262-08290-X

Statistical view of Data Mining. Advanced, requires good statistical knowledge.

Tan and Witten are put on special researve at the library and are also available in the Co-op Bookshop.

More Resources - links to related conferences and journals, research groups and software

 


Last modified: 9 June 2010