I am
generally interested in the application of data mining to text and Traditional
Chinese Medicine, also educational tools in computer science. But I am more
than happy to discuss with any interesting project you have.
Sentiment Analysis from a Class Survey
(Hons or MIT 18cps)
An online
mid-semester survey is carried out for every undergraduate subject through
WebCT. However, WebCT does not give a good interface to help analyze and query
the survey results. In particular, the generated report does not help in the
analysis of the written comments. This project is the first enabling step to do
“opinion mining” from these mid-semester surveys. In this work, a web-based can
plot graphs using the numbers, and highlighting certain important aspects from
the text upon the user’s request, as well as the correlation of their numerical
and textual feedback.
Pre-requisite knowledge. It is anticipated the student to have completed or to do
COMP5318 Knowledge Discovery and Data Mining.
Incremental Time Series Learning with
Symbolic Representation (Hons or MIT 18cps)
We usually
associate times series prediction with numbers and formulas, however, we can
also transform these numbers into symbols so that we can use many existing data
mining algorithms to detect interesting patterns and/or anomaly. This project
is to review one such representation called SAX (Symbolic Aggregate
Approximation). A proper representation will then be used to apply to
incremental learning from a data stream.
Pre-requisite knowledge. It is anticipated the student to have completed or to do
COMP5318 Knowledge Discovery and Data Mining.
Reference. Experiencing
SAX: A Novel Symbolic Representation of Time Series
A Bayeisan Approach to Find
Symptoms-Herbs Relationship (Hons or MIT 18cps)
Traditional
Chinese Medicine (TCM) is a holistic approach to treating diseases, and the
prescribing of herbal medicine is one of the major treatment methods. TCM
generally uses a set of herbs to address a set of symptoms. These medical
formulas seem to be effective, but they are of lack the scientific proof that
people nowadays demands. Information technology has been successful in the
bioinformatics to help scientists in the biological and genetic environments;
IT should be a good tool to help us understand the underlying principles of this
ancient medicine, aka TCM Informatics. This project aims to explore the
symptoms-herbs relationship using Bayesian approach, as this learning method
stems from the strong theoretical background of statistics. Promising results
have been demonstrated by the prior work of other researchers.
Pre-requisite knowledge. It is desirable the student to have completed COMP5318
Knowledge Discovery and Data Mining.
Reference. Latent tree
models and diagnosis in traditional Chinese medicine
Web-based Application for Students
Survey Analysis (MIT 6 or 12cps)
An online
mid-semester survey is carried out for every undergraduate subject through
WebCT. However, WebCT does not give a good interface to help analyze and query
the survey results. In particular, the generated report does not help in the
analysis of the written comments. This project is the first enabling step to do
“opinion mining” from these mid-semester surveys. In this work, a web-based can
plot graphs using the numbers, and highlighting certain important aspects from
the text upon the user’s request, as well as the correlation of their numerical
and textual feedback.
Pre-requisite knowledge. It is anticipated the student who takes up this project is
good at web-based programming.
Implementation of New Measures for
Class-based Association Rules Mining (MIT 6 or 12cps)
This project
is to implement the "trivialness" and "relevance" measures
from set-theoretic approach to Class-based Association Rules (CAR) mining
or/and one of our tools. The project includes the modification of the program(s)
by adding these 2 measures to the algorithm, testing and analysis of the
results from different datasets.
Pre-requisite knowledge. The student who wants to take this project should have
done COMP5318 Knowledge Discovery and Data Mining unit of study, and preferably
with Distinction or above.