Hons/MIT/Engineering Projects (2011 Sem1)

 

I am generally interested in the application of data mining to text and Traditional Chinese Medicine, also educational tools in computer science. But I am more than happy to discuss with any interesting project you have.

 

Research Projects

 

Sentiment Analysis from a Class Survey (Hons or MIT 18cps)

An online mid-semester survey is carried out for every undergraduate subject through WebCT. However, WebCT does not give a good interface to help analyze and query the survey results. In particular, the generated report does not help in the analysis of the written comments. This project is the first enabling step to do “opinion mining” from these mid-semester surveys. In this work, a web-based can plot graphs using the numbers, and highlighting certain important aspects from the text upon the user’s request, as well as the correlation of their numerical and textual feedback.

Pre-requisite knowledge. It is anticipated the student to have completed or to do COMP5318 Knowledge Discovery and Data Mining.

 

 

Incremental Time Series Learning with Symbolic Representation (Hons or MIT 18cps)

We usually associate times series prediction with numbers and formulas, however, we can also transform these numbers into symbols so that we can use many existing data mining algorithms to detect interesting patterns and/or anomaly. This project is to review one such representation called SAX (Symbolic Aggregate Approximation). A proper representation will then be used to apply to incremental learning from a data stream.

Pre-requisite knowledge. It is anticipated the student to have completed or to do COMP5318 Knowledge Discovery and Data Mining.

Reference. Experiencing SAX: A Novel Symbolic Representation of Time Series

 

 

A Bayeisan Approach to Find Symptoms-Herbs Relationship (Hons or MIT 18cps)

Traditional Chinese Medicine (TCM) is a holistic approach to treating diseases, and the prescribing of herbal medicine is one of the major treatment methods. TCM generally uses a set of herbs to address a set of symptoms. These medical formulas seem to be effective, but they are of lack the scientific proof that people nowadays demands. Information technology has been successful in the bioinformatics to help scientists in the biological and genetic environments; IT should be a good tool to help us understand the underlying principles of this ancient medicine, aka TCM Informatics. This project aims to explore the symptoms-herbs relationship using Bayesian approach, as this learning method stems from the strong theoretical background of statistics. Promising results have been demonstrated by the prior work of other researchers.

Pre-requisite knowledge. It is desirable the student to have completed COMP5318 Knowledge Discovery and Data Mining.

Reference. Latent tree models and diagnosis in traditional Chinese medicine

 

 

Software Development Projects

 

Web-based Application for Students Survey Analysis (MIT 6 or 12cps)

An online mid-semester survey is carried out for every undergraduate subject through WebCT. However, WebCT does not give a good interface to help analyze and query the survey results. In particular, the generated report does not help in the analysis of the written comments. This project is the first enabling step to do “opinion mining” from these mid-semester surveys. In this work, a web-based can plot graphs using the numbers, and highlighting certain important aspects from the text upon the user’s request, as well as the correlation of their numerical and textual feedback.

Pre-requisite knowledge. It is anticipated the student who takes up this project is good at web-based programming.

 

 

Implementation of New Measures for Class-based Association Rules Mining (MIT 6 or 12cps)

This project is to implement the "trivialness" and "relevance" measures from set-theoretic approach to Class-based Association Rules (CAR) mining or/and one of our tools. The project includes the modification of the program(s) by adding these 2 measures to the algorithm, testing and analysis of the results from different datasets.

Pre-requisite knowledge. The student who wants to take this project should have done COMP5318 Knowledge Discovery and Data Mining unit of study, and preferably with Distinction or above.