This refers to the research efforts that devise and apply informatics methods (computer science, mathematics, etc.) to address biological problems. This is a relatively new field, but has already made a huge impact. The recently high profile projects, such as sequencing the human DNA, is only possible with bioinformatics.
My interests mainly lie in the applications of graph algorithms and visualization methods to the understanding of the complex interation among the genes and overall functions of gene groups - the so-called system biology approach.
eScience is a fairly broad area. It usually covers the work that applying information technology methods to other science research field. There are two sub-areas that I am particularly intersted in:
I am mainly interested in improving the query performance of large datasets. Currently I focus on queries relate to graph databases, whose topological data structure does not work well with relational database. Graph databases usually require quite computationally expensive operations such as graph layout and network analysis.
This is mainly a bioinformatics eScience project. It aims to build a ontology database for phenotype data. Being a ontology database means its data model is based on a common/standard ontology (we are looking at Ontology for Biomedical Investigation (OBI)) and the data are annotated with ontology terms (such as the Mammalian Phenotype Ontology). The database will accommodate a large variety of phenotype data including blood analysis results from flow cytometer and the images from histopathology.
This is also a bioinformatics eScience project. It aims to build up the hardware, software, and human capabilities of managing and analyzing the data produced by the next-generation sequencing (or high-throughput sequencing) technique. The project focuses on the Australian National University and other universities and research institute within the ACT area. The John Curtin School of Medical Research currently has one illumina and one 454 machine. The infrastructure to support the storage, retrieval, querying of sequence data needs to catch up, and the building up of capability to analyze these data just started.
Gene Ontology is designed for describing gene functions. It is quite large with tens of thousands of terms and being updated frequently. If you are interesed in a gene, you can search the Gene Onotlogy what functions that gene has from previous studies. In this study, we are interested in the functions of a group of genes, not just one. To achieve this, we first build a network connectiong genes to their functions. After that, various analyese (such as clustering and centrality measurments) are done to find important functions -reference here-. We developed a few layout algorithms to visual the results -reference here-, and we are currently implementing them in CytoScape -reference here- so they can be used together with other existing tools. We are also looking the issue of how to navigate in a large ontology such as this usig interaction techniques.
Most graphs have rich information associated with their nodes and edges. For instance, in a social network there is information about a person as a node (such as gender, age, interests) and the relationship as edge (such as what it is and when it was formed). Multivariate Graph Visualization aims to show both the graph structure and this rich information at the same time to reveal the possible relationships between the two. We designed a new visual metaphore GraphScape -reference here- for this purpose and just finished its user study.
The complex relations in our real time usualy contain multiple networks. For instance, in social life there are friendship network, email network, family tree, and organization chart. They are all related to each other and usualy share nodes. This project aim s to show the commonality and difference among multiple networks. The need to show multiple networks increases the visual complexity, and we devised a few 3D approaches to address -reference here-. We started with 2 networks -reference here-, and currently moving to 3 and even more networks.
This is a bioinformatics project when I was at CSIRO. The project aims to build a service-based semantic web system to support bioinformatics workflow. All the data sources and analysis tools are provided as Web Services, so they can be accessed without having to understand their internal implmentation. This also allows easy integration with existing services and easy to add or remove services from the system.
All the data in the system are annotated with ontology to provide semantics. Different ontologies are used to describe various component such as the services, workflow, and their biological purpose.
Another important aspect of the project is provenance, which is the metadata of the analysis.These include the tools/methods used, their settings, intermediate and fianl results, and execution log. These data makes it possible to reproduce the analysis and easier to share the analysis workflow.
The main application domains of Genome Tracker are colorectal cancer and Alzheimer disease.
The project is part of the HxI Initiative, which is a collaboration among NICTA, CSIRO, and DSTO. I was part of the NICTA team. The project mainly looked into how the next generation table-top display can be used to facilitate remote collaboration. It studied how multiple people interact with objects displayed in a table-top using hands, pointing, gesture, and other nature interaction methods. It also looked into how people work with others physically co-located (around the same table) vs remote collaborator, and how to reduce the difference between the two.
As the name indicates, VALACON project is about graph visualization and analysis. Here is another page. It covers work on graph theory algorithms, graph layout algorithms, and visualization and interaction for graphs. Most of the research outcomes are implemented in a 3D visualization package GEOMI, which is now open source. An important part of my work in the project is to apply the visulization techniques to biological research.
This my PhD project in which I worked on improving the performance of visualizing and querying large 3D terrain surface (with millions of polygons) when they are too big to fit into memory. The idea of multi-resolution (also known as multi-scale) is to have multiple respresentations of the surface at various error levels. Depends on application requirement, a resolution or scale is selected to proivde enough accuracy while reduce the amount of data (the data volumn decrease as error increases). My work is devising new approaches to speed up this process for visualization and querying (such as surface k-nearest-neighbor problem).