Squangle.r
This document is available as a pdf (with slightly different formatting) here.
This text documents the functions available in the R package
"squangle20080303.r". This package infers phylogenetic quartets using the
three group invariant functions known as the "squangles" (http://arxiv.org/abs/0711.3503).
- Author:
- Jeremy Sumner
or
- Depends:
- R package: "tensor" (http://cran.r-project.org/).
Files: "sqg1.txt", "sqg2.txt", "sqg3.txt".
The polynomial form of the degree d=5 squangles is encoded in the
files sqg1.txt", "sqg2.txt", "sqg3.txt". Noting that these polynomials are
in a basis as specified in Appendix B of http://arxiv.org/abs/0711.3503,
reading the .txt files by row, the coefficients of each term occur as every
6th integer and the variables themselves are represented by the
interleaving 5 integers with the correspondence:
pijkl ~ 43*i + 42*j +
41*k + 40*l + 1.
For example, the first 6 integers in "sqg1.txt" are (24, 12, 127, 161, 18,
197) and this corresponds to the term 24 * p0023 *
p1332 * p2200 * p0101 * p3010.
read.seq
- Description:
- Uses the R function "scan" to read multiple sets of
m-aligned sequences of length N. Assumes the "phylip" format is
followed.
- Usage:
- read.seq(seq.file)
- Arguments:
- seqfile: The file containing multiple sets of m
aligned DNA sequences of length N. Format must be that used in the
"phylip" program with each sequence appearing as single, uninterrupted
character string. eg.
m N
SeqA CACCCGG...
SeqB CGCCCGC...
...
SeqM CAAACCC...
- Value:
- A character vector. eg.
[1] "m"
[2] "N"
[3] "SeqA"
[4] "CACCCGG..."
[5] "SeqB"
...
[1+2M] "SeqM"
[2+2M] "CAAACCC..."
squangle
- Description:
- Computes the value of the three squangle polynomials
{Q1,Q2,Q3} evaluated on the data set of a quartet of aligned DNA
sequences. It will only work for aligned quartets arranged sequentially.
A termination message is given if m ≠ 4.
- Usage:
- squangle(allseq,quart=c(1,2,3,4))
- Arguments:
-
- allseq:
- A character vector of multiple quartets of aligned
sequences as computed using "read.seq".
- quart:
- Optional: default=c(1,2,3,4)
Specifies which quartet to analyse.
- Value:
- A vector; the numerical values of the squangles.
tree
- Description:
- Returns the maximum likelihood tree for the three
possibilities,
tree1=(12,34),
tree2=(13,24),
tree3=(14,23).
For the three possible trees the expectation values of the squangles are
known to be,
tree1: E[Q1]=0 and E[Q2]=-E[Q3]>0
tree2: E[Q2]=0 and E[Q1]=E[Q3]>0
tree3: E[Q3]=0 and E[Q1]=E[Q2]<0
Assuming the squangles are independent and normally distributed with
identical variance=v, and mean=u>0 as above, the likelihood of each
quartet is
L1~nd(Q1,0,v)*nd(Q2,u,v)*nd(Q3,-u,v)
L2~nd(Q1,u,v)*nd(Q2,0,v)*nd(Q3,u,v)
L3~nd(Q1,-u,v)*nd(Q2,-u,v)*nd(Q3,0,v).
Under these conditions the MLE
of u is independent of v and corresponds to the least squares estimator.
Recalling the constraint u>0, we have
MLE[u|tree1]=max[0, (Q2-Q3)/2]
MLE[u|tree2]=max[0, (Q1+Q3)/2]
MLE[u|tree3]=max[0, -(Q1+Q2)/2].
Note: u=0 corresponds to the star tree.
- Usage:
- tree(squang)
- Arguments:
- squang: A vector of squangles. ie. c(Q1,Q2,Q3)
- Value:
- Character vector.
write.squang
read.squang
- Description:
- Convenience functions for reading and writing to
file a list of multiple squangle vectors.
- Usage:
- write.squang(squangs,file.sqg)
read.squang(file.sqg)
- Arguments:
- squangs: A list of squangle vectors.
file.sqg: File to write/read to/from.
- Values:
- file.sqg A list of squangle vectors.