Abstract
Phasing genotype data to identify the composite haplotype pairs is a widely-studied problem due to its value for understanding genetic contributions to diseases, population genetics research, and other significant endeavors. The accuracy of the phasing is crucial as identification of haplotypes is frequently the first step of expensive and vitally important studies. We present a combinatorial approach to this problem which we call SplittingHeirs. This approach is biologically motivated as it is based on three widely accepted principles: there tend to be relatively few unique haplotypes within a population, there tend to be clusters of haplotypes that are similar to each other, and some haplotypes are relatively common. We have tested SplittingHeirs, along with several popular existing phasing methods including PHASE, HAP, EM, and Pure Parsimony, on seven sets of haplotype data for which the true phase is known. Our method yields the highest accuracy obtainable by these methods in all cases. Furthermore, SplittingHeirs is robust and had higher accuracy than any of the other approaches for the two datasets with high recombination rates. The success of SplittingHeirs validates the assumptions made by the dense graph model and highlights the benefits of finding globally optimal solutions.
Original language | American English |
---|---|
Journal | International Conference on Bioinformatics |
DOIs | |
State | Published - Aug 2 2010 |
Externally published | Yes |
Disciplines
- Computer Sciences
- Bioinformatics
- Artificial Intelligence and Robotics