Honours projects 2009
Projects supervised by Sanjay Chawla
Discovering Outliers on Manifolds (18cp)
While a single image is often represented as a matrix of pixels a collection of images related to an event is best represented as a lower dimensional manifold. A manifold is a generalization of a surface. For example, a sphere is a three dimensional object but is also a two dimensional manifold. The objective of the project is to design and implement efficient algorithms to find outliers on manifolds. This has immediate applications in diverse areas ranging from video surveillance and genomics -wherever high dimensional data is abundant. For example, a traffic accident on a busy intersection will show up as an outlier.
The logic, philosophy and implications of Data Mining (12 or 18cp)
The two most important forms of inference are deduction and induction. For example, if we say that all apples are green and therefore the next apple will be green, then that is a form of deduction.. On the other hand if we say that all apples I have seen so far are green and therefore the next apple I will see will be green. Then this is an example of induction. However, none of these match what is typically carried out in data mining. It appears a weaker form of induction, known as abduction or inference to best explanation explains data mining techniques. For example, a common situation in data mining is as follows. A object O is discovered as an anomaly. However, if the hypothesis H were true then O can be explained and is therefore no longer an anomaly. Therefore H must be true. The objective of this project to fully understand abduction, its implications and how different data mining techniques use abduction to infer from data. A corollary will be to use abduction to differentiate between the disciplines of "data mining", "machine learning" and "statistics."