I am very happy to take inquiries from prospective students interested in undertaking substantial research projects such as for MSc by Research and PhD. It is not practical to list all the possible project descriptions here, because everyone coming in to a post-graduate degree is different. I am happy to work with students interested in molecular evolution, phylogenetics, cophylogenetics, biological modelling, combinatorial optimisation, and probably a few other broad areas I've missed.
Important note: The following projects are by no means exhaustive. If you have an idea for a project that relates to something like the subjects below, or perhaps something else, then please contact me. In many cases a project can be designed around your needs and interests. Note also that the projects below are quite extensible and may be developed further or perhaps in slightly different directions, depending on your progress.
Three quarters of emergent diseases come to humans from other species. These include, and are by no means limited to, SARS, HIV, Ebola and 'Flu. Despite the clear health issues very little is known about the general dynamics of these "zoonosis" events.
While it is well understood that the evolution of some biological species is closely linked with others, for example in parasites and their hosts or pathogens and their victims such as the emergent disease above, it is not known how well we can hope to recover the events that occurred in their shared evolutionary history, that gave rise to the patterns of associations that we can observe today. These events include simultaneous speciation (cospeciation) of host and parasite species, independent speciation of either host or parasite, and parasites "switching" hosts to infect different host species. Methods of recovering what really took place are hampered by "untraceable" events, where for instance host species went extinct or parasites 'tried' to switch hosts and failed.
This project will extend simulation software (written in C++ in a command-line environment) to output simulated histories according to a simple coevolutionary model under different model parameters. The program will output data in a format for cophylogenetic analyses with other existing software. The project will then determine which events are recovered by a range of possible approaches in the literature, and will result in a better understanding of how we can recover ancient coevolutionary dynamics from molecular sequence data.
A central problem in recovering evolutionary trees (phylogenies) from molecular (DNA) sequence data is that of which evolutionary model to use. While it is generally held that multiple processes are involved in the evolution of a single set of related molecular sequences, most phylogenetic methods either apply a single general model of sequence evolution, or partition the data very crudely into 1st, 2nd and 3rd codon positions (3rd position is highly redundant as often two or four DNA codons differing only in the 3rd position code for the same amino acid). This is a gross simplification of the underlying processes and cannot help us fully understand the complex processes underlying molecular evolution. Another result of this simplification is that phylogenetic reconstruction methods do not stand the best chance of accurately recovering the true history of the organisms involved. One successful method to date clusters the sequences into groups that do not differ by a certain amount, in order to maximise the chance of constructing each subtree accurately, before combining them with existing software.
This project will develop a data partitioning program in C++ to run in a command-line environment. The program will use greedy heuristics to assign subsets of the data into different overlapping subsets to maximise the probability of constructing the complete tree from the reconstructed subtrees. Real molecular data will be obtained from the public databases, and different data partitioning methods will be compared to find the best ones.
In order to test evolutionary hypotheses it is frequently a requirement to simulate biological data according to some kind of stochastic model. The power of simple models s that they have great explanatory power in cases where molecular sequences are evolving in a simple manner, but in general, molecular sequences have much more complex mechanisms by which they arise, such as non-independent characters, insertion and deletion events, and other mutable traits.
Not every researcher needs to simulate data in the same way: in some experiments the genetic sequence is crucial and single nucleotides evolve independently and identically across each gene, whereas in others only the gene order is important and the events of interest are those that cause gene rearrangement. Also some viruses are capable of recombination, that is, the mixing of genetic material from two (or more) ancestral lineages, giving rise to DNA sequences whose genes (or parts of genes) are from different 'parent' lineages. With all the potential complexity of molecular sequence evolution it is not feasible to require every possible parameter to be prescribed, nor is is sensible to write a new application for each evolutionary model; rather, a terse model description method is the aim.
The main goal of this project is to develop a general simulation model description language and a framework with which to simulate molecular sequence data according to such models.
This is a 1 or 2-semester project, in which you will develop a realistic, extensible stochastic simulation language. This language will be useful for many stochastic simulation models, not just for molecular sequence evolution; the aim is that this will lead to a standard usable in many areas of computational biology. The target application for this simulation framework will parse simulation model descriptions in the above language for generating complex sequence data for use in bioinformatics research, including such events as changes in nucleotide frequency, changes in rates of evolution over time, changes in speciation rate for different lineages, etc.
Keywords: stochastic simulation, molecular evolution, evolution
Disciplines: stochastic modelling, computer programming, language
structure