|
|
|
School of Information Technologies |
COMP5318 KNOWLEDGE DISCOVERY
AND
DATA MINING
Semester 1,
2012
News
| 3/3/2012 |
Welcome
to
COMP5318! |
| Activity |
Day |
Time |
Venue |
| Lectures |
Monday |
6-8pm |
Architecture LT 1 |
| Laboratory/Tutorial (start in Week 2) |
Monday |
8-9pm |
SIT labs 115, 116 and 117 |
Assessment overview
The assignment
specifications will
be available on the eLearning
| Assignement |
% |
Out |
Due |
Individual/Group |
Notes |
Late
submission policy |
| Ass1: Test |
15 |
|
w6,
in
class |
Individual |
In
the 1st first hour of the lectures (6-7pm). Semi-open as the exam.
Students are allowed 1 sheet of their own notes (A4-size,
double-sided, handwritten or typed). The test will cover the material
on Clustering. |
Not
possible to re-sit the test. |
| Ass2: Data analysis |
20 |
w10, Friday, 5pm |
Individual or in pairs (groups of more than 2 people are not allowed) | Submission: 1) hard copy in the locker labelled COMP5318 located in the School of IT Building, level 1, in the postgraduate labs wing and 2) electronically via eLearning | - A
penalty of minus 1 mark per each day after the deadline - the maximum delay is 7 days; after that assignments will not be accepted |
|
| Ass3: Research paper presentation final schedule |
15 |
w12
and
13,
in
class |
Group |
- No late presentations are allowed; a student who is unable to present on the specified date will receive 0 marks for this assessment | ||
| Written exam | 50 |
examination
period |
Individual |
The exam will be semi-open. You are
allowed 1 sheet of your own notes (hand-written or typed,
double-sided, A4-size) and a non-programable calculator (you don't need
a calculator). No other material is
allowed (no book,
no additional notes). The exam will be on all material except
Clustering. |
Academic honesty: Please read
the University
Policy
on
Academic
Honesty and submit the appropriate cover sheet
with your signature with your assignments. The cover sheets are
available from the link above.
| Week |
Date |
Topic |
| 1 |
5 March |
Admin
matters.
Introduction
to
Data
Mining
(DM);
challenges,
origins,
DM
vs
Machine Learning and Knowledge Discovery in Databases; DM tasks. Data: types, cleaning (noise, missing values), pre-processing (aggregation, feature selection, discretization and binarization, normalization), similarity measures. |
| 2 |
12 March |
Clustering1: Introduction to clustering. Partitional algorithms: k-means, bisecting k-means. Hierarchical algorithms: single, complete and average link; Ward’s method. |
| 3 |
19 March |
Clustering 2: Fuzzy clustering – c-means algorithm. Self-organising maps (SOM). |
| 4 |
26 March |
Clustering 3: Density-based clustering – DBSCAN algorithm. Evaluating clustering results: unsupervised and supervised measures; determining the number of clusters, evaluating the clustering tendency. |
| 5 |
2 April |
Classification 1: Introduction. Nearest-neighbour algorithm. Rule-based classifiers: 1R and PRISM. |
| |
9 April |
Mid-semester
break |
| 6 |
16 April |
Ass1: Test Classification 2: Evaluating classifiers: performance measures; evaluation procedures: single holdout, cross validation, bootstrapping. Comparing two classifiers - statistical significance testing. |
| 7 |
23 April |
Classification 3: Bayesian classifiers: Naïve Bayes and Bayesian networks. |
| 8 |
30 April |
Decision trees: building decision trees, information gain, decision boundary, overfitting and pruning. |
| 9 |
7 May |
Feature subset selection: CFS, Relief, Wrapper-based approaches. Dimensionality reduction - PCA and SVD. |
| 10 |
14 May |
Classification 5: Linear regression. Support vector machines (SVM); maximum margin hyperplane, finding a maximum margin hyperplane as an optimisation problem, linear SVM with hard and soft margin, nonlinear SVM (kernel trick and Mercer’s theorem). |
| 11 |
21 May |
Association rules 1: Introduction. Mining frequent items. Apriori algorithm. Association rules 2: Ass2: Data analysis due Wednesday 5pm |
| 12 |
28
May |
Ass3: Student
presentations of research papers. |
| 13 |
4 June |
Ass3: Student
presentations
of
research
papers. |
Textbook
| Introduction
to
Data
Mining Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Pearson Education (Addison Wesley), 0-321-32136-7, 2006 Chapters 4, 6 and 8 are freely available here and from the publisher.
|
|
Recommended book
| Data mining - practical machine
learning
tools and techniques with Java implementations, 3d edition Ian H. Witten, Eibe Frank and M. Hall Morgan Kaufmann, 2011, ISBN: 978-0-12-374856-0 Machine Learning view of Data Mining. Very readable. The book of the WEKA software. You cana lso use the previous edition of the book (2d edition). |
|
Other recommended books
| Data Mining: Introductory and
Advanced
Topics Margaret Dunham, Prentice Hall, 0-13088892-3, 2003 Good coverage of the topics included in the course. Very readable. Pseudo code and computation complexity covered. |
|
| Data Mining
Concepts and Techniques. J. Han and M. Kamber Morgan Kaufmann, 2006, ISBN 1-55860-901-6 Database view of Data Mining. |
|
| Principles of Data
Mining D. Hand, H. Mannila, P. Smyth, Principles of data mining, MIT Press, 2001, ISBN: 0-262-08290-X Statistical view of Data Mining. Advanced, requires good statistical knowledge. |
|
Tan and Witten are placed in the library Reserve collection (2 Hour Loan collection) and are also available in the Co-op
Bookshop.
Last modified: 12 May 2012