Abstract
The recent revolution in genomic sequencing has created new opportunities for exploring the connection between genomic variation and biological traits. By sequencing multiple individual genomes within a species, it is possible to identify genomic regions of divergence between groups of individuals sharing particular phenotypic traits. Such a strategy have in the literature been successfully applied for studies of parallel evolution, but none of these earlier studies have made the underlying methodology or tools readily accessible. It is therefore difficult to reproduce their
results or to reuse the methodology for new investigations.
I here present an open tool for doing such analyses between two groups of genomic sequences. One method calculates a cluster separation
score based on a two-dimensional scaling of the pairwise differences between individuals of the population. The other method uses the Fisher's exact test score for each single-nucleotide polymorphism found. The tools reproduce earlier published results on parallel evolution in freshwater three-spine sticklebacks and a long-term evolution experiment with Drosophila. The cluster separation scorer gives slightly more accurate results, but the right choice of parameters is of importance in both cases.
Both methods are implemented as Galaxy tools in the Genomic HyperBrowser web server. In theory, the tools allows anyone to perform analyses identifying connections between genomic sequence and phenotypic traits based on sequencing data. However, the complexity of the methodology and the non-uniformity of formats used to represent the relevant genomic data is a challenge in practice. Other issues related to usability includes the amount of knowledge of the underlying methods required, as well as challenges with post-processing, including visualization, of the genome-wide
results.