Maximum-Parsimony Haplotype Inference Based on Sparse Representations of Genotypes

The haplotypes of an individual can be used to predict diseases and help designing drugs. However, experimentally determining haplotypes is expensive and time-consuming, so genotypes are usually measured instead. Given the set of genotypes for a group of unrelated individuals, it is possible to infe...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on signal processing Vol. 60; no. 4; pp. 2013 - 2023
Main Authors Jajamovich, G. H., Xiaodong Wang
Format Journal Article
LanguageEnglish
Published New York, NY IEEE 01.04.2012
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1053-587X
1941-0476
DOI10.1109/TSP.2011.2179542

Cover

More Information
Summary:The haplotypes of an individual can be used to predict diseases and help designing drugs. However, experimentally determining haplotypes is expensive and time-consuming, so genotypes are usually measured instead. Given the set of genotypes for a group of unrelated individuals, it is possible to infer the haplotype pair for each subject based on the maximum parsimony principle. Finding the exact solution to this problem is NP-hard. We propose two related formulations of the haplotype inference problem that translate the maximum parsimony principle into the sparse representation of genotypes. In the first formulation we look for the set of haplotypes that explain the genotypes such that the resulting frequency vector of haplotypes is as sparse as possible. The sparseness condition is achieved by minimizing the Tsallis entropy of the frequency vector, which is still an NP-hard problem. We propose a method that enumerates all local minima with high probability by solving a set of integer linear programs of low dimensionality. The minimizer is then found by identifying the local minimum point that achieves the lowest Tsallis entropy. In the second formulation, we state the haplotypes inference as a sparse dictionary selection problem. Each genotype is reconstructed by a haplotype pair selected from a set of available haplotypes that needs to be sparse. This leads to an approximately submodular maximization problem and therefore, can be solved with a fast greedy method. We test the proposed solutions with different data sets and compare the performance with the state-of-the-art methods, achieving similar or better results.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ISSN:1053-587X
1941-0476
DOI:10.1109/TSP.2011.2179542