Some open questions I'd like answered
Maybe these have been answered already: if so, can someone let me know? If
you're interested in solving them, please get in touch!
These are some questions in cophylogeny
Background: A host tree H and dependent parasite or pathogen
tree P such that each leaf of P is associated with one or
more leaf in H by some known mapping f. There are four events
of codivergence, duplication, host-switching and
loss. Loss can arise from different processes of
extinction, sampling failure and (missing the
boat or sorting) events. Given that we cannot arbitrarily
assign event costs to these four recognisable types, the best solutions are
only Pareto-Optimal: any one of them could be optimal for some
feasible set of event costs. Empirical evidence suggests the number of
such (POpt) maps grows exponentially with the size of H and
P.
Problem: Find the (Pareto-)optimal maps that describe the most likely
history of associations between P and H.
- Given H, P and known associations f, how many
Pareto-optimal solutions are there?
- If f is one-to-many, how many solutions are there?
- How can one represent a consensus of POpt maps?
- Given H, P and f, is there a (tight) bound on
the minimum number of non-codivergence events that could account for the
differences between H and P?
These are some questions in microarray analysis
Background: Given a set of protein coding genes and an array of
expression values, we want to determine the regulatory relationships among
the genes represented on the array.
- What is the relationship between gene function (e.g., regulatory or
not) and the "usual" distribution of expression values on a chip?
- What is the relationship between topological characteristics of a
gene in an underlying GRN and the distribution of expression values?
- What is the quickest way to remove cycles in a graph? They're a
complete proctalgia fugax when one is trying to estimate a
Bayesian probability network...
These are some questions in phylogenetics
Background: The essential problem in phylogenetics is to recover the
best possible estimate of the evolutionary relationships among a group of
species (species, genera, lineage, strains) in which we are interested.
Most often this is done by passing molecular sequence data to some kind of
computerised method which uses a model to judge which of an enormous number
of possible trees is the best one.
- Parametric bootstrapping is a process in which an inferred model is
used to create artificial data on an inferred tree, and those data are
then presented to an inference program to see if the same tree is
recovered. What is the bias of this process?