Friday, November 7, 2014

Question on Independent Origins Test

We just published a paper on convergent evolution, which uses a new test of convergence called the "Independent Origins Test". In the main text, the description of the test is limited (however see the supplement).

Just now, I received a question about this test, and I paste my answer below, in case others might benefit from the answer.

THE QUESTION: Dear Dr Oakley I read you article:  Predictable transcriptome evolution in the convergent and complex bioluminescent organs of squid (great!)


I do not undestand the logic behind this
The observed data are approximately and conservatively 5,000 times less likely to have arisen from an evolutionary history with less than three gains of photophores than from an evolutionary history with three or more photophore gains 
we should compared 1 gain (ancestral) follow by 8 losses versus 2 independent gains here you compare less than 3 gain (1 or 2 (which is the case here)
versus 4 ,5 ,6 
could please tell me what am I missing


THE ANSWER
  Yes, the traditionally more common way to frame alternative hypotheses to test independent origins is to compare the likelihood of X gains versus Y losses. This is what we did for example in Oakley and Cunningham (2002) 

    However, the "independent origins" test in this 2014 paper frames the alternative hypotheses in a different way. The alternatives are: 1 gain of the trait (= homology) versus more than one gain (=independent origins). The test calculates probabilities (assuming the model of trait evolution, the phylogeny, and the distribution of traits on the tree) of these alternative hypotheses.

    Why, you might ask then, did we compare the probabilities of "1 or 2 gains" to "3 or more gains"? The general reason is to be conservative.  More specifically, in the case of these squid, there is a clade where photophore might have evolved more than once within that clade itself. This was not the focus of the paper, and we were really interested in whether two distantly related clades (loliginids and sepiolids) evolved photophores separately. Since the independent origins test counts total number of gains on the entire tree, it was not distinguishing between two gains in those distant clades versus a separate gain within sepiolids. To be conservative then, we reported the probability of at least 3 gains. Examining "at least 2 gains" would yield even higher differences between alternative models.

    I believe the easiest critique of this approach is the simplicity of the model, which assumes the same rate of character gain and loss for the entire tree. In simulations I have done (a la Simmap), a little bit of homoplasy on a tree leads to estimated rates of trait evolution that require HUGE numbers of gains and losses on a tree, to the point of being biologically very unrealistic. I believe that models that allow different rates of evolution on different parts of the tree could do better at yielding biologically realistic rates of trait evolution. See for example Skinner (2010)