Honours projects 2008

Projects supervised by Josiah Poon

Send an email to josiah AT it DOT usyd DOT edu DOT au if you are interested in any of the project or your proposed projects.

Project Title: Data Mining of Complex Interactions
The importance of developing a good understanding of the diverse interactions and interrelatedness among sets of organisational factors as they impact on performance is widely acknowledged. Changing one factor may have little effect if other factors remain unchanged. Understanding these complex relationships is central for a range of academic disciplines. The main challenge is to develop appropriate research strategy to understand relationships among interacting factors and derive suitable analytical model to assess of the strength of interactions. The focus of this research is the model construction, the aim is to develop a new data mining algorithm to discover interaction patterns from data.

Project Title: Image Classification using Bridging
A classification task in machine learning has an implicit assumption that the dataset has a balanced distribution. However, these are not necessary true in real life and, in fact, most of the things we want to find out belong to rare cases, e.g. fraud credit card transaction, breast cancer cells from x-ray images. “Bridging” is an algorithm that was initially proposed in Zelikovitz and Hirsh (2003) for text classification. Weng and Poon (2006) have demonstrated that it can also serve as a good rebalancing strategy when the dataset is imbalanced. An honours student last year has demonstrated this approach can be adapted to the image domain. It was an encouraging news but the limitation and condition of this technique needs further exploration.

Project Title: Mining from Text and Image
Most of the existing IE systems are limited to extracting information only from text. Recently there has been great interest in mining from both text and image. Kou et al (2007) built a system to extract information from both text and images. In such a system, associating the information from the text and image requires matching sub-figures in a figure with the sentences in the text. They used a stacked graphical model to match the labels of sub-figures with labels of sentences. They shown that the stacked graphical model can take advantage of the context information and achieved a significant improvement in the matching accuracy. However, the system only looked at one particular aspect of biology (protein subcellular locations), this project aims to explore how this technique can be generalised to cover other aspects.

Reference. Kou, Z, Cohen, W. and Murphy, R. (2007). A Stacked Graphical Model for Associating Sub-Images with Sub-Captions. Pacific Symposium on Biocomputing 12:257-268(2007).

Project Title: Concept Mapping
It appears that the current problem for TCM (Traditional Chinese Medicine) informatics is not at the software or hardware levels. it is at the semantic/conceptual level. In another word, the difficulty is really how to convert terms/concepts and their relationships in TCM in Chinese into the corresponding terms/concepts and relationships in western medicine (WM). The mapping is not a straightforward process and it is definitely not always a one-to-one process. This project explores how machine learning can assist the mapping process, and this represents a small step to bring TCM and WM a bit closer.